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M< »hhr>Hfi "for Generating and Screeni n g No vel M etabolic Pathways 

1 . FIELD OF THE INVENTION 
The present invention relates to a novel approach to 
5 drug discovery. More particularly, the invention relates to 
a system for preserving the genomes of organisms that are 
good or promising sources of drugs; for randomly combining 
genetic materials from one or more species of organisms to 
generate novel metabolic pathways; and for pre - screening or 
1G— screening such genetically engineered cells for the 

generation of novel biochemical pathways and the production 
of novel classes of compounds. The novel or reconstituted 
metabolic pathways can have utility in commercial production 
of the compounds. 

15 

2 . BACKGROUND OF THE INVENTION 
2.1. SOURCES OF DRUG LEADS 
The basic challenges in drug discovery are to identify a 
lead compound with the desirable activity, and to optimize 
20 the lead compound to meet the criteria required to proceed 
with further drug development . One common approach to drug 
discovery involves presenting macromolecules implicated in 
causing a disease (disease targets) in bioassays in which 
potential drug candidates are tested for therapeutic 
25 activity. Such molecules could be receptors, enzymes or 
transcription factors. 

Another approach involves presenting whole cells or 
organisms that are representative of the causative agent of 
the disease. Such agents include bacteria and tumor cell 
30 lines. 

Traditionally, there are two sources of potential drug 
candidates, collections of natural products and synthetic 
chemicals. Identification of lead compounds has been 
achieved by random screening of such collections which 
35 encompass as broad a range of structural types as possible. 
The recent development of synthetic combinatorial chemical 
libraries will further increase the number and. variety of 
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compounds available for screening. However, the diversity in 
any synthetic chemical library is limited to human 
imagination and skills of synthesis. 

Random screening of natural products from sources such 
5 as terrestrial bacteria, fungi, invertebrates and plants has 
resulted in the discovery of many important drugs (Franco et 
al. 1991, Critical Rev Biotechnol 11:193-276; Goodfellow et 
al. 1989, in "Microbial Products: New Approaches", Cambridge 
University Press, pp. 34 3-383; Berdy 1974, Adv Appl Microbiol 
10_L8, 3^-406; Suffness et al . 1988, in Biomedical Importance of 
Marine Organisms, D.G. Fautin, California Academy of 
Sciences, pages 151-157). More than 10,000 of these natural 
products are biologically active and at least 100 of these 
are currently in use as antibiotics, agrochemicals and anti- 
15 cancer agents. The success of this approach of drug 

discovery depends heavily on how many compounds enter a 
screening program. Typically, pharmaceutical companies 
screen compound collections containing hundreds of thousands 
of natural and synthetic compounds. However, the ratio of 
20 novel to previously-discovered compounds has diminished with 
time. In screens for ant i- cancer agents, for example, most 
of the microbial species which are biologically active may 
yield compounds that are already characterized. Partly, this 
is due to the difficulties of consistently and adequately 
2 5 finding, reproducing and supplying novel natural product 
samples. Since biological diversity is largely due to 
underlying molecular diversity, there is insufficient 
biological diversity in the organisms currently selected for 
random screening, which reduces the probability that novel 
30 compounds will be isolated. 

Novel bioactivity has consistently been found in various 
natural sources. See for example, Cragg et al., 1994. (in 
"Enthnobotany and the search for new drugs" Wiley, 
Chichester. pl7B~196) . Few of these sources have been 
35 explored systematically and thoroughly for novel drug leads. 
For example, it has been estimated that only 5000 plant 
species have been studied exhaustively for possible medical 
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use. This is a minor fraction of the estimated total of 
250,000-3,000,000 species, most of which grow in the tropics 
(Abelson 1990, Science 247:513) . Moreover, out of the 
estimated millions of species of marine microorganisms, only 
5 a small number have been characterized. Indeed, there is 
tremendous biodiversity that remains untapped as sources of 
lead compounds . 

Terrestrial microorganisms, fungi, invertebrates and 
plants have historically been used as sources of natural 

1 0_ products . However, apart from several well-studied groups of 
organisms, such as the actinomycetes , which have been 
developed for drug screening and commercial production, 
reproducibility and production problems still exist. For 
example, - the antitumor agent, taxol, is a constituent of the 

15 bark of mature Pacific yew trees, and its supply as a 

clinical agent has caused concern about damage to the local 
ecological system. Taxol contains 11 chiral centers with 
2048 possible diastereoisomeric forms so that its de novo 
synthesis on a commercial scale seems unlikely (Phillipson, 

20 1994, Trans Royal Soc Trop Med Hyg 88 Supp 1:17-19). 

Marine invertebrates are a promising source of novel 
compounds but there exist major weaknesses in the technology 
for conducting drug screens and large-scale resupply. For 
instance, marine invertebrates can be difficult to recollect, 

25 and many have seasonal variability in natural product 
content . 

Marine microorganisms are a promising source of novel 
compounds but there also exist major weaknesses in the 
technology for conducting drug screens and industrial 

30 fermentation with marine microorganisms. For instance, 

marine microorganisms are difficult to collect, establish and 
maintain in culture, and many have specialized nutrient 
requirements. A reliable source of unpolluted seawater is 
generally essential for fermentation. It is estimated that 

35 at least 99% of marine bacteria species do not survive on 
laboratory media. Furthermore, available commercial 

- 3 - 
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fermentation equipment is not optimal for use in saline 
conditions, or under high pressure. 

Furthermore, certain compounds appear in nature only 
when specific organisms interact with each other and the 
5 environment. Pathogens may alter plant gene expression and 
trigger synthesis of compounds, such as phytoalexins , that 
enable the plant to resist attack. For example, the wild 
tobacco plant Nicotiana sylvescris increases its synthesis of 
alkaloids when under attack from larvae of Manduca sexta. 

l.Q_Likew±se fungi can respond to phytoalexins by detoxification 
or preventing their accumulation. Such metabolites will be 
missed by traditional high- throughput screens, which do not 
evaluate a fungus together with its plant host. A dramatic 
example of the influence of the natural environment on an 

15 organism is seen with the poison dart frog. While a lethal 
dose of the sodium channel agonist alkaloid, batrachotoxin , 
can be harvested by rubbing the tip of a blow dart across the 
glandular back of a field specimen, batrachotoxin could not 
be detected in second generation terrarium-reared frogs 

20 (Daly, 1995, Proc . Natl. Acad. Sci . 92:9-13). If only 
traditional drug screening technologies are applied, 
potentially valuable molecules such as these may never be 
discovered . 

Moreover, a lead compound discovered through random 
25 screening rarely becomes a drug, since its potency, 
selectivity, bioavailability or stability may not be 
adequate. Typically, a certain quantity of the lead compound 
is required so that it can be modified structurally to 
improve its initial activity. However, current methods for 
30 synthesis and development of lead compounds from natural 
sources, especially plants, are relatively inefficient. 
There are significant obstacles associated with various 
stages of drug development, such as recollection, growth of 
the drug-producing organism, derepl icat ion , strain 
35 improvement, media improvement, and scale-up production. 
These problems delay clinical testing of new compounds and 
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affect the economics of using these new sources of drug 
leads . 

At present, the above-mentioned marine, botanical and 
animal sources of natural products are underused. The 
5 currently available methods for producing and screening lead 
compounds cannot be applied efficiently to these under- 
explored sources. Unlike some terrestrial bacteria and 
fungi, these drug-producing organisms are not readily 
amenable to industrial fermentation technologies. 
. 1Q_ Simultaneously , the pressure for finding novel sources for 
drugs is intensified by new high-efficiency and high- 
throughput screening technologies. Therefore, there is a 
general need for methods of harnessing the genetic resources 
and chemical diversity of these as yet untapped sources of 
15 compounds for the purpose of drug discovery. 

2.2. EXPRESSION LIBRARIES 
Most recently drug discovery programs have shifted to 
mechanism-based discovery screens. Once a molecular target 

20 is identified (e.g., a hormone receptor involved in 

regulating the disease) , assays are designed to identify 
and/or synthesize therapeutic agents that interact at a 
molecular level with the target. 

Gene expression libraries are used to identify, 

25 investigate and produce the target molecules. Expression 
cloning has become a conventional method for obtaining the 
target gene encoding a single protein without knowing the 
protein's physical properties. 

Many proteins identified by screening gene expression 

30 libraries prepared from human and mammalian tissues are 

potential disease targets, e.g., receptors (Simonsen et al . 
1994, Trends Pharmacol Sci 15.-437-441; Nakayama et al . 1992, 
Curr Opin Biotechnol 3:497-505; Aruffo, 1991, Curr Opin 
Biotechnol, 2:735-741), and signal - t ransducing proteins 

35 (Margolis et al . , US 5,434,064). See Seed et al . , 1987, Proc 
Natl Acad Sci 84:3365-3369; Yamasaki et al . , 1988, Science 
241:825-825; and Lin et al., 1992, Cell 66:775-785, (type III 
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TGF-jS receptor) for examples of proteins identified by 
functional expression cloning in mammalian cells. 

Once a disease target is identified, the protein target 
or engineered host cells that express the protein target have 
5 been used in biological assays to screen for lead compounds 
(Luyten et al . 1993, Trends Biotechnol 11:247-54). Thus, 
within the scheme of drug discovery, the use of gene 
expression libraries has been largely limited to the 
identification and production of potential protein disease 

10._taxgets.. Only in those instances where the drug is a protein 
or small peptide, e.g., antibodies, have expression libraries 
been prepared in order to generate and screen for molecules 
having the desirable biological activity (Huse et al . 1991, 
Ciba Foundation Symp 159:91-102). 

15 However, there are other applications of gene expression 

libraries that are relevant to drug discovery. Gene 
libraries of microorganisms have been prepared for the 
purpose of identifying genes involved in biosynthet ic 
pathways that produce medicinally- act ive metabolites and 

20 specialty chemicals. These pathways require multiple 
proteins (specifically, enzymes) , entailing greater 
complexity than the single proteins used as drug targets. 
For example, genes encoding pathways of bacterial polyketide 
synthases (PKSs) were identified by screening gene libraries 

25 of the organism (Malpartida et al . 1984, Nature 309:462; 
Donadio et al . 1991, Science 252:675-679). PKSs catalyze 
multiple steps of the biosynthesis of polyket ides , an 
important class of therapeutic compounds, and control the 
structural diversity of the polyket ides produced. A host- 

30 vector system in Streptomyces has been developed that allows 
directed mutation and expression of cloned PKS genes 
(McDaniel et al . 1993, Science 262:154 6-1550; Kao et al . 
1994, Science 265:509-512). This specific host-vector system 
has been used to develop more efficient ways of producing 

35 polyketides, and to rationally develop novel polyketides 
(Khosia et al . . WO 95/08548). 

- 6 - 
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Another example is the production of the textile dye, 
indigo, by fermentation in an E. coli host. Two operons 
containing the genes that encode the multienzyme biosynthetic 
pathway have been genetically manipulated to improve 
5 production of indigo by the foreign E . coli host.(Ensley et 
al 1983, Science 222:167-169; Murdock et al - 1993, 
Bio/Technology 11:381-386). Overall, conventional studies of 
heterologous expression of genes encoding a metabolic pathway 
involve directed cloning, sequence analysis, designed 
1G ~mutat-ions, and rearrangement of specific genes that encode 
proteins known to be involved in previously characterized 
metabolic pathways. 

in view of numerous advances in the understanding of 
disease mechanisms and identification of drug targets, there 
15 is an increasing need for innovative strategies and methods 
for rapidly identifying lead compounds and channeling them 
toward clinical testing. 

3 . cutMMARY Q F THP - TNVENTION 
20 The present invention provides a drug discovery system 

for generating and screening molecular diversity for the 
purpose of drug discovery. The method of the invention 
captures and preserves in combinatorial gene expression 
libraries the genetic material of organisms that are known/or 
25 prospective sources of drug leads. 

In one embodiment, the invention involves the 
construction of combinatorial natural pathway gene .expression 
libraries from one or more species of donor organisms 
including microbes, plants and animals, especially those that 
30 cannot be recovered in substantial amounts in nature, or be 
cultured in the laboratory. The donor organisms in the pool 
m ay be selected on the basis of their known biological 
properties, or they may be a mixture of known and/or 
unidentified, species of organ.sms collected from nature. 
35 Random fragments of the genomes of donor organisms, some or 
which contain entire biochemical pathways or portions 
thereof are cloned and expressed xn the host organisms. 
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According to the invention, a subset of the gene 
products of the transferred DNA are capable of functioning in 
the host organism. The naturally-occurring pathways of the 
donor organisms may thus be reconstituted in the host 
5 organism. The expression of donor genes in the dissimilar 
physiological and regulatory environment of a heterologous 
host can unmask otherwise silent metabolic pathways The 
metabolic pathways of the donor organism may also interact 
with metabolic pathways resident in the host organism to 

10 t €n ? te n ° Vel COm P° unds or compounds not normally produced 
by the host organism. 

Moreover, because only a defined subset of donor 
organrsm genes is expressed in the host organism at any one 
t«. the system can render metabolic pathways and compounds 
15 easxer to detect against an already charactered 
brochemical/cellular background of the host organism 
Essentially, the genetic resources of these donor organisms 
are captured and preserved in the gene expression libraries 
which can be replicated and used repeatedly in different drug 
20 discovery programs. 

In another embodiment, the invention involves the 
construction of combinatorial chimeric pathway expression 
libraries in which genetic material derived from one or more 
species of donor organism is randomly combined, cloned, and 
25 expressed in the host organism. Such libraries generate 
random combinations of genes from multiple pathways and 
organisms, which gives rise to metabolic pathways and 
discrete gene sets previously non-existent in nature The 
term "discrete gene set" refers to any assemblage of two or 
30 more genes obtained from the ligation of genes from one or 
more pathway or organism in a combinatorial gene expression 
library. The plurality of gene products are capable of 
functioning in the host organism, where they interact to form 
novel chimeric metabolic pathways that produce novel classes 
35 of compounds. Thus, the diversity of molecular structures 
available for drug screening is increased by mixing the 
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genetic material of the extant pathways and organisms in the 
combinatorial chimeric gene expression library. 

While standard methods of screening gene expression 
libraries can be used, the libraries can be further modified 
5 to incorporate a reporter regimen tailored to identify clones 
that are expressing the desirable pathways and metabolic 
products. In a specific embodiment, the host organisms are 
engineered to include a gene encoding a reporter protein 
operatively associated with a chemoresponsive promoter that 
.10- responds to the desirable class of metabolites to be detected 
in the expression library. 

In an alternative embodiment, the host organism may be 
exposed to a physiological probe which is a precursor of a 
reporter molecule that is converted directly or indirectly to 
15 the reporter molecule by a compound produced in the pathway 
sought. Activation of expression of the reporter or 
conversion of a reporter precursor produces a signal that 
allows for identification and isolation of the desirable 
clones . 

20 In yet another embodiment of the invention, the host 

organisms in the library may be embedded in a semi -sol id 
matrix with a reporter regimen or another indicator cell type 
that contains an assay or is itself a target for the 
desirable compound, e.g., pathogens for ant i - infect ives , or 

25 cancer cells for antitumor agents. High- throughput screening 
processes can be used, e.g., macrodroplet sorting, 
fluorescence activated cell sorting or magnetic activated 
cell sorting, to identify and isolate the desired organisms 
in a combinatorial gene expression library. 

30 The positive clones may be further analyzed for the 

production of novel compounds. The genetics- and biochemistry 
of the metabolic pathway that lead to production of the novel 
compounds may be delineated by characterizing the genetic 
material that was introduced into the isolated clones. 

35 The present invention also relates to recombinant DNA 

vectors useful for constructing combinatorial gene expression 
libraries, specific combinatorial gene expression libraries, 
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host organisms containing a particular type of reporter 
system, host organisms modified for facilitating production 
of otherwise toxic compounds, and compositions comprising 
host organisms, indicator cells and/or a reporter regimen 

5 

3-1 . DEFINITIONS 
As used herein, the following terms will have the 
meanings indicated. 

A "combinatorial natural pathway expression library" is 
10-a library of expression constructs prepared from genetic 
material derived from a plurality of species of donor 
organisms, in which genes present in the genetic material are 
operably associated with regulatory regions that drive 
expression of the genes in an appropriate host organism. The 
15 combinatorial expression library utilizes host organisms that 
are capable of producing functional gene products of the 
donor organisms. The genetic material in each of the host 
organism encodes naturally- occurring biochemical pathways or 
portions thereof from one of the donor organisms. 
20 A "combinatorial chimeric pathway expression library" is 

a library of expression constructs prepared from randomly 
concatenated genetic material derived from one or more 
species of donor organisms, in which genes present in the 
genetic material are operably associated with regulatory 
25 regions that drive expression of the genes in an appropriate 
host organism. The host organisms used are capable of 
producing functional gene products of the donor organisms. 

A "biased combinatorial gene expression library" is a 
library of expression constructs prepared from genetic 
30 material derived from one or more species of donor organisms, 
which has been preselected for a specific property. The 
preselected genetic material can be used to prepare 
combinatorial natural pathway or chimeric libraries. 

As used herein, the term "library" refers to expression 
35 constructs or host organisms containing the expression 
constructs . 
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The terms -biochemical pathway", "natural pathway" and 
-metabolic pathway" encompass any series of related 
biochemical reactions that are carried out by an organism. 
Such pathways may include but are not limited to biosynthetic 
5 or biodegradative pathways, or pathways of energy generatxon 
or conversion. 

A "compound" is any molecule that is the result or by- 
product of a biochemical pathway, and is usually the product 
of interactions of a plurality of gene products. 

, 19 .. An "activity" is the capability of a host organism to 

carry out a biochemical reaction or a series of biochemical 
reactions leading to the production of a compound of 
interest . 

As used in the present invention, the following 
15 abbreviations will apply: eq (equivalents); M (Molar); mM 
(millimolar) ; fiM (micromolar) ; N (Normal); mol (moles); mmol 
(millimoles) ; jimol (micromoles) ; nmol (nanomoles) ; kg 
(kilograms); gm (grams); mg (milligrams); M9 (micrograms); ng 
(nanograms); L (liters); mL (milliliters); pi (microliters); 
20 vol (volumes); s (seconds); and °C (degrees Centigrade). 

In addition, the following abbreviations are used: 
Cfu: colony forming units; LB: Luria Broth; ddH 2 D : double- 
distilled, reversed osmosis purified water; sea H 2 0 : Filtered 
Pacific seawater; SSW: synthetic seawater; FACS : 
25 fluorescence-activated cell sorting; GFP : Aeguorea victoria 
green fluorescent protein; kbp : Kilobase pairs; g: Gravity; 
rpm: Rotations per minute; CIAP: Calf intestinal alkaline 
phosphatase; EDTA : Ethylenediamine tetraacetic acid; TE : 10mM 
Tris/1.5 mM EDTA P H 7.4; PEG: Polyethylene glycol; E. cola: 
30 Escherichia coli; CHO : Chinese hamster ovary; S. cerevisiae: 
Saccharomyces cerevisiae; A. nidulans: Aspergillus nidulans; 
S. pombe: Schizosaccharomyces pombe ; S. lividans: 
Streptomyces lividans; S. aureus: Staphylococcus aureus; S. 
coelicolor: Streptowyces coelicolor; B. suhtilis: Bacillus 
35 subtilis; BAC: Bacterial artificial chromosome; YAC : yeast 
artificial chromosome; PCR: polymerase chain reaction; CaMV : 
cauliflower mosaic virus; AcNPV : autographa californica 
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flUCjear Polyhydrosis virus; EBV: Epstein -Barr virus; SDS ■ 
sodium dodecyl sulfate; CsCl : cesium chloride. 

4 • DESCRIPTI ON OF THF FIGURR.q 
5 Figure 1: Expression construct for combinatorial 

natural pathway expression library. The expression construct 
contains vector DNA and a donor DNA fragment that comprises 
genes encoding a metabolic pathway and natively associated 
regulatory regions. 
ia ~ -figure 2: Expression construct for combinatorial 
chimeric pathway expression library. The expression 
construct contains vector DNA and five concatenated gene 
cassettes each comprising donor DNA and regulatory region. 

Figure 3: A cloning strategy for combinatorial natural 
15 pathway expression library. Clonable DNA (B ) is extracted 
from donor organisms (A) is partially digested with a 
restriction enzyme to generate fragments of genomic DNA (C) 
encoding naturally-occurring biochemical pathways or portions 
thereof. A DNA vector (D) digested with a restriction enzyme 
20 to generate a vector having compatible ends (E) is ligated to 
the fragments of genomoic DNA to form expression constructs 

Figures 4A-4C: Assembly of a gene cassette. Figure 4A 
depicts an annealed, phosphorylated lac promoter fragment 
25 containing a cohesive BamHI site and a blunt end 

corresponding to a portion of a Smal site. Figure 4B depicts 
a promoter dimer containing a BamHI site flanked on each side 
by a Jac promoter. Figure 4C depicts concatenated promoter 
fragments . 

30 Figures 5A-5F: Cloning strategy for combinatorial 

chimeric pathway expression library. Figure 5A shows the 
steps in preparing promoter and terminator fragments for 
directional cloning of cDNA and genomic DNA inserts. Figure 
5B shows the steps in preparing promoter and terminator 

35 fragments for ligation to genomic DNA inserts. Figure 5C 
shows the steps in preparing cDNA inserts for directional 
cloning, assembly of gene cassettes, and attachment to solid 
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support . Figure 5D shows the steps in preparing genomic DNA 
inserts for cloning, assembly of gene cassettes and 
attachment to solid support . Figures 5E and 5F show the 
serial ligation and deprotection of gene cassettes to form a 
5 concatemer, the ligation of the concatemer to an S. pombe/E. 
coli shuttle vector (pDblet) , release of the expression 
construct from the solid support and circulari zat ion of the 
expression construct. 

Figures 6A-6B: Vectors useful for preparing 

10— combinatorial gene expression libraries. Figure 6A shows a 
map of Streptocos . The cosmid vector Streptocos contains a 
unique BamHI site flanked by T3 and T7 promoters in the 
multiple cloning site, the origin of replication and 
thiostrepton resistance gene from pIJ 699, a ColEl origin 

15 (ori) , an ampicillin gene (Amp) and two cos sites. Figure 6B 
shows a map of modified pDblet . The plasmid pDblet is 
modified in the multiple cloning site (MCS) , and contains a 
ColEl origin of replication, an ampicillin gene (Ap R ) , two 
copies of autonomous replicating sequence (ARS) , an ura4 

20 marker, and the 0-galactosidease gene (LacZ) . A: Aatll; B: 
BamHI N: Ndel . Figure 6C shows the oligomer containing an 
altered BstXI sequence and a Ncol site, which was ligated in 
excess to SacI/NotI cut pDblet to form modified pDblet. 

Figure 7 shows a chemoresponsive construct pERD-2 0-GFP 

25 comprising a reporter gene encoding green fluorescent protein 
(GFP) , a chemoresponsive promoter (Pm) and its associated 
regulator (XylS) . 

Figure 8 shows a macrodroplet comprising a permeable 
matrix, in which is encapsulated a clone from a combinatorial 

30 gene expression library, and an indicator cell which contains 
a reporter regimen. 

Figures 9A and 9B provides an example of FACS sorting of 
a pool of £. coli cells, with and without the presence of 
expression constructs comprising marine bacterial genes. E. 

35 coli, strain XL1-MR containing the chemoresponsive construct 
pERD-20-GFP, referred to as XL1-GFP was infected with a 
cosmid library of marine bacterial genes. The XL1-GFP cells 
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with or without the marine bacteria genes were cultured for 
12 hours at 30°C, and subjected to two cycles of FACS 
sorting. Figure 9A: XL1-GFP with marine bacterial genes; 
Figure 9B : control XL1-GFP cells. 
5 Figure 10 shows an alignment of the amino acid sequence 

of actinorhodin dehydrase of Streptomyces coelicolor , and the 
predicted partial amino acid sequence derived from CXC-AMN20 . 
Plain boxes indicate sequence identity, and shaded boxes 
indicate conservative sequence homology. 

10— ---Figure 11: PCR detection of clone CXC-AMN2 0 sequences in 
pools of genomic DNA of marine bacteria. The figure shows a 
stained agarose gel containing PCR amplicons derived from 
marine bacteria genomic DNA. M: molecular weight markers, 
sizes in bp. -: negative control. +: positive controls for 

15 the atnplicon and for ribosomal RNA. The lanes contain 

amplicons derived from T: genomic DNA from all 37 species of 
marine bacteria; 1, 2, 3, 4: pools of genomic DNA of marine 
bacteria . 

Figure 12A-C. PCR detection of clone CXC-AMN2 0 sequences 
20 in genomic DNA of marine bacteria species. The figures show 

stained agarose gels containing PCR amplicons derived from 

genomic DNA of individual species of marine bacteria. M-. 

molecular weight markers, sizes in bp. -: negative control. 

+: positive controls for the amplicon and for ribosomal RNA. 
25 The lanes contain amplicons derived from genomic DNA of 

marine bacteria: species #1-10, #12-20 and #21-35 in pool 1, 

2 and 3 respectively. 

5 . DETAILED DESCRIPTION OF THE INVENTION 
30 The present invention relates to a drug discovery system 

that provides methods and compositions for capturing and 
preserving the diversity of genetic resources in nature, and 
for translating and expanding the captured genetic resources 
into diversity of chemical structures. The invention also 
35 facilitates screening for desirable activities and compounds. 

More particularly, the invention provides methods for 
constructing and screening combinatorial gene expression 
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libraries. These libraries comprise random assortments of 
gene products of multiple species which are in some cases 
allowed to interact with each other in the expression host, 
and result in some cases in the formation of novel 
5 biochemical pathways and/or the production of novel classes 
of compounds. Moreover, the libraries of the invention 
provide efficient access to otherwise inaccessible sources of 
molecular diversity. 

The novel biochemical pathways may carry out processes 
1 a including but not limited to structural modification of a 
substance, addition of chemical groups to the substance, or 
decomposition of the substance. 

The novel classes of compound may include but are not 
limited to metabolites, secondary metabolites, enzymes, or 
15 structural components of an organism. A compound of interest 
may have one or more potential therapeutic properties, 
including but not limited to antibiotic, antiviral, 
antitumor, pharmacological or immunomodulating properties or 
be other commercially-valuable chemicals such as pigments. A 
20 compound may serve as an agonist or an antagonist to a class 
of receptor or a particular receptor. 

As used in the present invention, the term 
"combinatorial gene expression library" encompasses 
combinatorial natural pathway expression library, 
25 combinatorial chimeric pathway expression library as well as 
host organisms containing the libraries of expression 
. constructs . 

A "combinatorial natural pathway expression library" is 
a library of expression constructs prepared from genetic 

3 0 material derived from one or more species of donor organisms, 
in which genes present in the genetic material are operably 
associated with regulatory regions that drives expression of 
the genes in an appropriate host organism. The combinatorial 
expression library utilizes host organisms that are capable 

35 of producing functional gene products of the donor organisms. 
The genetic material in each of the host organism encodes 
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naturally-occurring biochemical pathways or portions thereof 
from one of the donor organisms. 

A "combinatorial chimeric pathway expression library" is 
a library of expression constructs prepared from randomly 
5 concatenated genetic material derived from a plurality of 
species of donor organisms, in which genes present in the 
genetic material are operably associated with regulatory 
regions that drives expression of the genes in an appropriate 
host organism. The host organisms used are capable of 
10~producing functional gene products of the donor organisms. 
Upon expression in the host organism, gene products of the 
donor organism (s) may interact to form novel chimeric 
biochemical pathways. 

Generally, the methods of the invention comprise 
15 providing genetic material derived from one or more donor 
organism (s) , manipulating said genetic material, and 
introducing said genetic material into a host organism via a 
cloning or expression vector so that one or more genes of the 
donor organism (s) are transferred to and expressed in the 
20 host organism. Such host organisms containing donor genetic 
material are pooled to form a library. 

The transferred genetic material, typically comprises a 
random assortment of genes, the expression of which is driven 
and controlled by one or more functional regulatory regions. 
25 The expression construct or vector may provide some of these 
regulatory regions. The genes of the donor organism (s) are 
transcribed, translated and processed in the host organism to 
produce functional proteins that in turn generate the 
metabolites of interest . 
30 According to the present invention, gene expression 

libraries comprising - complete naturally occurring biochemical 
pathways or substantial portions thereof can greatly 
facilitate searches for donor multi-enzyme systems 
responsible for making compounds or providing activities cf 
35 interest. Genes that are involved in a particular 
biochemical pathway can be conveniently isolated and 
characterized in a single expression construct or clone. A 
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typical arrangement of such an expression construct is shown 
in Figure 1 . 

Once a desirable activity or compound is identified, 
this convenient feature can greatly facilitate downstream 
5 drug development efforts, such as strain improvement and 
process development. The positive clone can be cultured 
under standard conditions to produce the desired compound m 
substantial amounts for further studies or uses. The genes 
of the biochemical pathway are immediately available for 
10_ sequencing, mutation, expression, and further rounds of 
screening. The cloned biochemical pathway is readily 
amenable to traditional and/or genetic manipulations for 
overproduction of the desired compound. 

Furthermore, biochemical pathways that are otherwise 
15 silent or undetectable in the donor organism may be 
discovered more easily by virtue of their functional 
reconstitution in the host organism. Since the biochemical 
characteristics of the host organism are well known, many 
deviations as a result of expression of donor genetic 
20 material can readily be recognized. Novel compounds may be 
detected by comparing extracts of a host organism containing 
donor genetic material against a profile of compounds known 
to be produced by the control host organism under a given set 
of environmental conditions. Even very low levels of a 
25 desirable activity or compound may be detected when the host 
biochemical and cellular background of the host organism is 
well characterized. As described in later sections, the 
present invention provides methods for detecting and 
isolating clones that produce the desirable activity or class 
3 0 of compounds. 

in a preferred embodiment, the methods may be applied to 
donor organism (s) that cannot be recovered in substantial 
amounts in nature, or cultured in the laboratory. By 
transferring genetic material from such organisms into a host 
35 organism, the organisms' metabolic pathways can be 

reproduced, and their products tested efficiently for any 
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desirable properties. Thus, the genetic diversity of these 
organisms is captured and preserved. 

In another embodiment of the invention, a combinatorial 
chimeric pathway gene expression library can be constructed 
5 in which the genetic materials from one or multiple donor 
organisms are randomly concatenated prior to introduction 
into the host organism. Thus, each host organism in the 
library may individually contain a unique, random combination 
of genes derived from the various donor pathways or 
10.orgaais.ms. Figure 2 shows the arrangement of genes and 
regulatory regions in an expression construct of a 
combinatorial chimeric pathway gene expression library. For 
the most part, such combinations of genes in the library do 
not occur in nature. Upon expression, the functional gene 
15 products of the various donor pathways or organisms interact 
with each other in individual host organisms to generate 
combinations of biochemical reactions which result in novel 
chimeric metabolic pathways and/or production of novel 
compounds. Collectively, the genetic resources of the donor 
20 organisms in the library are translated into a diversity of 
chemical compounds that may not be found in individual donor 
organisms . 

In another aspect of the invention, the species of donor 
organisms may be selected on the basis of their biological 

25 characteristics, or ability to carry out desirable but 

uncharacterized biochemical reactions that are complementary 
to the host organism. Such desirable characteristics may 
include, but are not limited to the capability to utilize 
certain nutrients, to survive under extreme conditions, to 

30 derivatize a chemical structure, and the ability to break 
down or catalyze formation of certain types of chemical 
linkages. When genes of the donor organism are expressed in 
the host organism, the donor gene products can modify and/or 
substitute the functions of host gene products that 
35 constitute host metabolic pathways, thereby generating novel 
hybrid pathways. Novel activities and/or compounds may be 
produced by hybrid pathways comprising donor and host -derived 
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components. The target metabolic pathway modified by donor 
gene products may be native to the host organism. 
Alternatively, the target metabolic pathway may be provxded 
by products of heterologous genes which are endogenous or 
5 have been genetically engineered into every host organism 
prior to or contemporaneous to construction of the gene 
expression library. Thus, the present invention also 
embodies constructing and screening gene expression 
libraries, wherein DNA fragments encoding metabolic pathway 
10-. of, donor organisms are cloned and coexpressed in host 
organisms containing a target metabolic pathway. 

In another embodiment of the invention, the host 
organism may have an enhanced complement of active drug 
efflux systems which secretes the compounds of interest into 
15 the culture medium, thus reducing the toxicity of the 

compounds to the host organism. Absorptive material, e.g., 
neutral resins, may be used during culturing of the host 
organisms, whereby metabolites produced and secreted by the 
host organism may be sequestered, thus facilitating recovery 
20 of the metabolites. 

in order to make the process of screening combinatorial 
gene expression libraries more efficient, the present 
invention further provides methods for detecting those host 
organisms in the library that possess the activity or 
25 compound of interest. In one embodiment of the invention, 
the host organism contains a reporter system that will 
respond to the presence of an introduced change, such as the 
presence of the desirable compound or activity, by activating 
the de novo synthesis of a reporter molecule. In another 
30 embodiment, the host organism contains the precursor of a 
reporter molecule, or a physiological probe, which is 
converted to the reporter by the presence of the desirable 
compound or activity. The reporter molecule in the positive 
clone generates a signal which allows detection of the 
35 positive clone in the expression library, as well as its 
isolation from the other non-product a ve clones. 
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In many respects, the drug discovery system provides 
significant convenience and time advantage to the various 
steps of drug development up to clinical trials. The 
libraries of the invention are compatible with the 
5 established multi-well footprint format and robotics for 
high- throughput screening. The host organisms of the 
invention are organisms commonly used for genetic 
manipulation and/or process development. The present 
invention takes advantage of the fact that such host 
10-org*nisois or production hosts are well -characterized in terms 
of their biological properties and maintenance requirements 
By transferring genetic materials from a donor organism to 
other more familiar expression systems, the need for 
difficult culturing conditions for the donor organism is 
15 reduced. Thus, the biological activities, the 

pharmacokinetic and toxic properties of any lead compound 
discovered in the system of the invention may be studied and 
optimized more efficiently. 

The novel metabolic pathway generated in a positive 
20 clone can be delineated by standard techniques in molecular ■ 
biology. The lead compound may be synthesized by culturing a 
clone of the drug-producing host organism under standard or 
empirically determined culture conditions, so that sufficient 
quantities of the lead compound may be Isolated for further 
25 analysis and development. There are already high purity 

manufacturing protocols, such as Good Manufacturing Practice 
(GMP) established for some of these standard industrial host 
organisms. Unlike conventional methods of screening natural 
product sources, less effort is required to adapt the 
30 screening and production technologies to the particular 
requirements of each potential drug-producing organism. 

The present invention also provides specific 
combinatorial gene expression libraries made according to the 
methods of the invention from genetic materials of a 
35 particular set of donor organisms and/or cell types. Not all 
organisms or cell types in a set, especially mixed samples. 
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need to be individually identified or characterized to enable 
preparation of the combinatorial gene expression libraries. 

Any combinatorial gene expression library of the 
invention may be amplified, replicated, and stored. 
5 Amplification refers to culturing the initial host organisms 
containing donor DNA so that multiple clones of the host 
organisms are produced. Replication refers to picking and 
growing of individual clones in the library. A combinatorial 
gene expression library of the invention may be stored and 
10- retrieved by any techniques known in the art that is 

appropriate for the host organism- Thus, the libraries of 
the invention are an effective means of capturing and 
preserving the genetic resources of donor organisms, which 
may be accessed repeatedly in a drug discovery program. 

15 

5.1. PREPARATION OF COMBINATORIAL GENE 
EXPRESSION LIBRARIES 



5.1.1. DONOR ORGANISMS 
Any organism can be a donor organism for the purpose of 
preparing a combinatorial gene expression library of the 
invention. The donor organisms may be obtained from private 
or public laboratory cultures, or culture deposits, such as 
the American Type Culture Collection, the International 
25 Mycological Institute, or from environmental samples either 
cultivable or uncul t ivable . 

The donor organism (s) may have been a traditional source 
of drug leads, such as terrestrial bacteria, fungi and 
plants. The donor organisms may be transgenic, genetically 
3Q manipulated or genetically selected strains that have been 
useful in generating and/or producing drugs. 

The donor organism (s) may or may not be cultivable with 
current state-of-the-art microbiological techniques e.g., the 
genetic material used to prepare the libraries can be 
35 obtained directly from an environmental sample. Since only a 
minority <s 1%) of the microbes found in nature can be 
cultured in the laboratory, the major advantage of the 
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present invention is that the donor organism does not have to 
be cultivable to be utilized herein (Torsvik et al . 1990, 
Appl Env Micro, 56:782-787) . 

The invention is not limited to the use of 
5 microorganisms as donors. Plants produce an enormous range 
of compounds, some with dramatic activities on both animals 
and microorganisms, for example, phytoalexins (Abelson 1990, 
Science 247:513). Some of these compounds are inducible by 
wounding or elicitors derived from the cell walls of plant 

10~pathogeos {Cramer et al . 1985, EMBO J. 4:285-289; Cramer et 
al. 1985, Science 227:1240-1243; Dron et al . 1988, Proc . 
Natl. Acad. Sci . USA 85 : 673 8 - 6742 ) . Biologically-active 
compounds, like taxol , camptothecin , and artemisinin are 
examples of plant-derived natural products which are 

15 undergoing clinical development respectively as anti-tumor 
and anti-malarial agents. Any plants, especially those with 
potential medicinal properties, may be desirable donor 
organisms (Phillipson, 1994, Trans R Soc Trop Med Hyg, 88 
Suppl l:S17-9; Chadwick et al . eds, 1994, in "Ethnobotany and 

20 the search for new drugs", Wiley, Chichester, Ciba Foundation 
Symp 185) . 

Another source of natural products with potentially 
useful antimicrobial or pharmacological properties are 
invertebrates and vertebrates. Some of these compounds serve 

25 as chemical defenses against competitors, pathogens and 

predators. Such compounds may also be used to kill prey or 
used as a form of communication (Caporale 1995, Proc Natl 
Acad Sci 92:75-82). In numerous cases, the secondary 
metabolites are thought to be produced by associated microbes 

30 that may be symbiotic {Faulkner et al . 1993, Gazzetta Chimica 
Italiana. 123:301-307; Bewley et al . 1995, in "An Overview of 
Symbiosis in Marine Natural Products Chemistry Symposium" in 
honor of Professor Antonio Gonzalez, La Laguna University, 
Canary Islands, September 16, 1995, p26 (abstract)). 

35 Organisms known to manipulate biochemical pathways of 

other organisms in nature are sources of particular interest, 
e.g. certain plants, such as Cy-cas, can produce an ecdysone- 
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mimic which disrupts the development of certain insects. 
Such organisms may live in the same ecological nxche where 
they exist as competitors, symbionts, predator and prey, or 
host and parasxte. Thus, it may be advantageous to use 
5 genetic materials derived from organisms that interact 
chemically with others in nature. 

Yet another rich source of natural products is marine 
organisms. For instance, marine microbes produce novel 
molecular structures, many of which are bioactxve, e.g. 
ia octalactin A which is a potential anti-cancer agent with a 
molecular structure not previously seen in terrestrxal 
bacteria (Tapiolas et al . 1991, J Amer Chem Soc, 113:4682- 
83) • and salinamides (Trischman et al . 1994, J Amer Chem Soc 
116-757-758) whxch have potent ant i -inflammatory properties. 
15 Certain compounds derived from marine microorganisms contaxn 
bromine from seawater which renders the compounds hxghly 
active because of the chemical reactivity of the incorporated 
halogen, e.g., marinone (Pathirana et al . 1992, Tetrahedron 
Lett 33:7663-7666), a product of mixed polyketxde and 
20 mevalonic acid biosynthetic pathways, which has select xve 

antibiotic activity against gram positive bacterxa . There xs 
a vast diversity of marine species which live in a range of 
habitats, from polar to tropical regions, with different 
salinities, temperatures and pressures. The unique nature of 
25 these habitats xs reflected in the distinct genetxcs and 

biochemistry of these organisms, and may provide many useful 
drug leads. See, for example, Fenical et al . 1992, xn 
"Marine Microorganisms; a new biologxcal resource", Adv xn 
Marine Biotechol, Vol. I, Plenum Press, New York. 
30 Environmental samples may be obtained from natural or 

man-made environments, and may contaxn a mixture of 
prokaryotic and eukaryotic organxsms, and viruses, some of 
which may be unxdent x f ied . Samples can exther be randomly 
collected or collected from areas that are ecologxcally 
35 stressed, for example, near an xndustrxal effluent. Soil. 

freshwater or seawater filtrates, deposits around ho. sprxngs 
or thermal vents, and marine or escuarxne sedxments may oe 
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used as sources of donor organisms. Samples may be collected 
from benthic, pelagic, and intertidal marine sources 
Samples may be collected from tropical, subtropical 
temperate and other regions. The donor organisms may be 
5 thermophilic, halophilic, acidophilic, barophilic, or 
methanogenic . 

It is also preferable to use organisms that are facing 
the possibility of extinction, such as those plants and 
microorganisms found in the tropical rain forest. insofar as 
10..,uch.hahitats are being destroyed, species are being lost 
that might yield useful medicines. 

algae^rr" ^"""^ P"P«tle.. including 

algae, lichens, fungi, plants, and animals, may also be 
collected on the basis of their uses in traditional or ethnic 
15 medical practices. niC 

in many aspects, it is desirable that the library is 
constructed with genetic material derived from donor 
organisms that are not generally amenable to traditional drug 
20 mavT ry " d6Vel ° Pment ^hnologies. Such donor organisms 
20 may have one or more of the following characteristics- 
(x) the organism cannot be propagated or cultured in the 
laboratory; the organism cannQt be recovered ^ 

xn amounts sufficient for further experiments; and/or 

25 ITII T Pr9 " 1Sm reqUireS SP6Cial CMditioM f « P-duction 
25 of the desirable compound that are unknown or are not 

commercially reasonable. The latter characteristics also 

describe organisms in extant culture collections, where no 

drug leads may have been detected in conventional screening 

processes due to inappropriate culture conditions 

30 F ° r thS P ur P os e of constructing an expression library 

the donor organisms need not be taxonomically defined or 

biochemically characterized. Identification or genetic 

footprints of a cultivable species or a representative 

group of species from an environmental samole may be 

35 performed depending on the complexity of the sample and th~ 

needs of the drug discovery program, such as, for example " a 

requirement for donor species derepl i cat ion . 
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The donor organisms may be concentrated or cultured in 
the laboratory or field prior to extraction of their nucleic 
acids. For preparing cDNAs , specific growth conditions or 
the presence of certain chemicals in the culture may be 

5 required to induce or enhance the transcription of gene 
products encoding the activities of interest in the donor 
organisms. Standard growth conditions may be used to culture 
the organisms if only genomic DNA is required. 

Since it is unlikely that all donor organisms in an 
.10- environmental sample may be propagated at the same rate, if 
at all under laboratory conditions, some of the donor 
organisms may overgrow and lead to the loss or dilution of 
slow-growing organisms. Thus, it may be preferable to 
prepare nucleic acids directly from donor organisms in an 

15 environmental sample without prior culturing in the 

laboratory. This may be especially useful when attempting to 
access the secondary metabolites of invertebrates such as 
marine sponges, where the metabolites are often believed to 
be produced by the associated symbiotic and uncultivable 

20 microbes. Methods for preparing high quality nucleic acids 
from donor organisms in environmental samples are provided 
below in Sections 5.1.2. 

Donor organisms contemplated by the invention may 
include, but are not limited to viruses; bacteria ,- 

25 unicellular eukaryotes, such as yeasts and protozoans; algae; 
fungi; plants; tunicates; bryozoans ; worms; echinoderms,- 
insects; mollusks; fishes; amphibians; reptiles; birds; and 
mammals. Non-limiting examples of donor organisms are listed 
in Tables I and II . 

30 



35 
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Table I : 

Vq?* °L eX t mP } a Z? bacterial and fungal donor organisms (Berdy 
1974, Adv Appl Microbiol, 18: 309-406; Goodfellow et al * 
198 9, xn Microbial Products: New Approaches", Cambridge 
University Press 343-383) 9 

Group 



Bacteria 

Ac tinomyce tales 



Eubacteriales 



Pseudomonadales 

Mycoplasma tal es 
Myxoba c t eri ales 

Fungi 

20 Myxothallophytes 



Phyc omyc etes 
Ascomyce tes 
Basi di omyc etes 
Fungi Imperfecti 



Genera 



S treptomyces , Mi cromon o spora , 

Norcadia, Actinomadura , 

Actinoplanes , 

Streptosporangium, 

Mi crobi spora , Kitasa tospori a 

Azobacterium, Rhizobium, 

A chromoba cterium, 

Enterobacterium, Brucella, 

Micrococcus, Lactobacillus, 

Bacillus, Clostridium, 

Brevibacterium 

Pseudomonas, Aerobacter , 

Vibrio, Halobacterium 

Mycoplasma 

Cytophaga, Myxococcus 

Physarum, Fuligo 

Mucor, Phytophtora, Rhizopus 

Aspergillus , Penicillium 

Coprinus, Phanerochaete 

Acremonium (Cephalosporium) , 

Trochoderma, Helminthosporium , 

Fusarium, Alternaria, 

Myro theci urn 

Saccharomyces 



30 
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Plants 

Algae 



Higher forms of exemplary donor organisms 
Exemplary Genera. Com pounds & Properties 



Higher Plants 



Protozoa 

Dinof lagellates 



Molluscs 
Sponges 



25 Corals 



Worms 



Annelida 
Spinunculida 
Tunicates 



Digenea simplex (kainic acid, 
antihelminthic) 

Laminaria anqustata (laminine, 
hypotensive) 

Usnea fasciata (vulpinicacid, 
antimicrobial; usnic acid, 
antitumor) 

Catharanthus {Vinca alkaloids) , 
Digitalis (cardiac glycosides) , 
Podophyllum (podophyllotoxin) , 
Taxus (taxol), Cephalotaxus 

(homoharringtonine) , 
Camptotheca (Camptothecin) , 
Artemisia ( artemisinin) , Coleus 

(forskolin) , Desmodium (K 
channel agonist) 

Ptychodiscus brevis 
(brevitoxin, cardiovascular) 
Dolomedes ("fishing spider" 
venoms) , Epilachna (mexican 
bean beetle alkaloids) 
Bugula neritina (bryostat ins , 
anti cancer) 
Conus toxins 

Microciona prolifera (ectyomn, 
antimicrobial) Cryptotethya 
cryta (D-arabino furanosides) 
Pseudoterogonia species 
(Pseudoteracins , anti- 
inflammatory) Erythropodium 
(erythrolides, anti- 
inflammatory) 

Lumbriconereis heteropa 
(nereistoxin, insect icidal ) 
Bonellia viridis (bonellin, 
neuroactive) 

Trididemnum solidum (didemnin, 
anti-tumor and anti-viral) 
Ecteinascidia turbinata 

(ecteinascidins , anti-tumor) 
Eptatretus stoutii (eptatretin, 
cardioactive) . Trachinus draco 

(proteinaceous toxins, reduce 
blood pressure, respiration and 
reduce* heart rate) 
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Amphibians 



Reptiles 
Birds 



Dendrobatid frogs 
(batrachotoxins, pumil iotoxins , 
histrionicotoxins, and other 
polyamines) 
Snake venom toxins 
histrionicotoxins, modified 
carotenoids, retinoids and 
steroids (Goodwin 1984 in "The 
Biochemistry of the 
Carotenoids" Vol. II, Chapman 
and Hall, New York, pp. 160- 
168) 

Orinthorhynohus anatinus (duck- 
billed platypus venom) , 
modified cantenoids, retinoids 
and steroids (Goodwin 1984, 
supra, pp. 173-185; Devlin 1982 
in "Textbook of Biochemistry" 
Wiley, New York, p. 750) 
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5.1.2. PREPARATION OF HIGH QUALITY NUCLEIC ACIDS 
FROM DONOR ORGANISMS 

Nucleic acids may be isolated from donor organisms by a 

variety of methods depending on the type of organisms and the 

5 source of the sample. It is important to obtain high quality 

nucleic acids that are free of nicks, single stranded gaps, 

and partial denaturation , and are of high molecular weight 

(especially for genomic DNA cloning) , in order to construct 

gene expression libraries that are fully representative of 

1Q the genetic information of donor organisms. To prepare high 
quality nucleic acid, the methods of the invention provide 
gentle, rapid and complete lysis of donor organisms in the 
sample, and rapid and complete inactivation of nucleases and 
other degradative proteins from the organisms. Initial 

15 extraction may be carried out in the field to stabilize the 
nucleic acids in the sample until further isolation steps can 
be performed in the laboratory 

Any nucleic acid isolation procedure requires efficient 
breakage of the donor organism. A number of standard 

20 techn i c 3 ues ma Y t> e used, including freezing in liquid 
nitrogen, grinding in the presence of glass or other 
disruptive agents, as well as simple mechanical shearing or 
enzymatic digestion. 

For mixed materials such as soil, or for samples that 

25 contain high amounts of tough materials, such as cellulose or 
chitin (as in filamentous fungi and plants, for instance), 
f reeze-drying may be employed to render the samples fragile, 
thus making them more amenable to disruption. Such 
lyophilized materials preserve both enzymatic as well as high 

3Q molecular weight materials (such as nucleic acids) for long 
periods (Gurney 1984, in Methods in Molecular Biology, Vol. 
2, p35-42, John M. Walker ed . ) . Samples may be flash frozen 
in liquid nitrogen. Samples that are loose, such as soil, 
can be frozen in fine gauze or nylon mesh. Lyophi 1 i za t ion 

35 can be carried out on frozen samples under vacuum for a 

period of 24-72 hours. Freeze-dried samples can be stored 
desiccated under vacuum at -70°C. Additional steps may be 
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required for preparation of environmental samples, such as 
concentration of microbial populations (Jacobson et al . 1982, 
Appl Env Microbiol, 58:2458-2462; Zhou et al . 1996, Appl Env 
Microbiol, 62:316-322; Somerville et al . 1989, Appl Env 
5 Microbiol, 55:548-554). 

One principal method of the present invention, though 
certainly not the only one to be used, is modified from 
Chirgwin et al . (1979, Biochem 24:5294), Sadler et al . {1992, 
Curr Genet, 21:409-416) and Foster (1991, Ph.D. thesis, 

10_Uni.vexs.ity of California, Santa Barbara) . The method uses 

the strong chaotropic agent, guanidinium isothiocyanate , with 
2 -mercaptoethanol to denature proteins and inactivate 
nucleases, followed by purification of the nucleic acid 
material by cesium chloride gradient centrif ugat ion . The 

15 method provided herein differs from Chirgwin' s method in that 
both DNA and RNA are extracted. Also included in the method 
of the invention is a high speed centrif ugat ion step, and the 
optional addition of bisbenzimide dye. Depending on the 
donor organism used, additional steps may include, but are 

20 not limited to, treatment with hexadecylpyridinium chloride 
or cetyl trimethyl ammonium bromide (CTAB) to selectively 
remove polysaccharides, treatment with polyvinylpyrrolidone 
for removal of phenolics, and cellulose chromatography for 
removal of starch and other carbohydrates (Murray & Thompson, 

25 1980, Nuc Acid Res 8: 4321-25). 

RNA isolated from donor organisms can be converted into 
complementary DNA (cDNA) using reverse transcriptase. 

Damaged nucleic acid may be difficult to clone resulting 
in loss of donor organism DNA and low numbers of clones in a 

30 library. The problem can be worsened if the host organism is 
permissive for recombination and lacks effective endogenous 
DNA repair mechanisms. The present invention also provides 
that damaged DNA can be repaired in vitro prior to cloning, 
using enzymatic reactions commonly employed during second 

35 strand synthesis of complementary DNA (Sambrook et al . 1989, 
in "Molecular Cloning" 2nd Edition, section 8) . . For example, 
DNA gaps and nicks may be repaired by the Klenow fragment of 

- 30 - 
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DNA polymerase, and E . coli DNA ligase. Such enzymatic 
reactions are well known to those skilled in the arts. 

When preparing a combinatorial expression library from 
DNA extracted from environmental samples, the quantity of 
5 available DNA is often limited, and is a consideration in the 
selection of ligation strategy. If the quantity is low after 
extraction or concatenation (<100 M 9> ■ the DNA may be ligated 
into a high-efficiency cloning system e.g., SuperCos, as 
described in Section 5.1.3. The inserts in the clones are 
10- amplified and are released from the vector by restriction 
enzyme digestion. Due to the nature of environmental DNA 
samples, which may contain both prokaryotic and eukaryotic 
donor organisms, it may be desirable to use multiple host 
organisms. If sufficient amount of original environmental 
15 DNA sample is available, or if the DNA has been amplified, 
the DNA may be ligated to each of a panel of vectors 
appropriate for the desired panel of expression host cells. 
Preferably, the vectors have the capacity to shuttle between 
two or more expression hosts. 



20 



5.1.3. HOST ORGANISMS A Nn VECTORS 
The term "host organism" as used herein broadly 
encompasses unicellular organisms, such as bacteria, and 
multicellular organisms, such as plants and animals. Any 
25 cell type may be used, including those that have been 

cultured in vitro or genetically engineered. Any host -vector 
systems known in the art may. be used in the present 
invention. The use of shuttle vectors that can be replicated 
and maintained in more than one host organism is 
30 advantageous. 

Host organisms or host cells may be obtained from 
private laboratory deposits, public culture collections such 
as the American Type Culture Collection, or from commercial 
suppliers. Such host organisms or ceils may be furtner 
35 modified by techniques known in the art for specific uses. 

According to the invention, it is preferable that the 
host organism or host cell has been used for expression of 



WO 96/34112 



PCT/US96/06003 



heterologous genes, and are reasonably well characterized 
biochemically, physiologically, and/or genetically. Such 
host organisms may have been used with traditional genetic 
strain improvement methods, breeding methods, fermentation 
5 processes, and/or recombinant DNA techniques. it is 

desirable to use host organisms which have been developed for 
large-scale production processes, and that conditions for 
growth and for production of secondary metabolites are known 
The host organisms may be cultured under standard 
lO^conditaons of temperature, incubation time, optical density 
and media composition corresponding to the nutritional and ' 
Physiological requirements of the expression host. However 
conditions for maintenance and production of a library may be 
different from those for expression and screening of the 
15 library. Modified culture conditions and media may also be 
used to emulate some nutritional and physiological features 
of the donor organisms, and to facilitate production of 
interesting metabolites. For example, chemical precursors of 
interesting compounds may be provided in the nutritional 
20 media to facilitate modifications of those precursors. Any 
techniques known in the art may be applied to establish the 
optimal conditions. 

The host organism should preferably be deficient in the 
abilities to undergo homologous recombination and to restrict 
25 foreign DNA. The host organism should preferably have a 
codon usage similar to that of the donor organism if 
eukaryotic donor organisms are used, it is preferable that 
the host organism has the ability to process the donor 
messenger RNA properly, e.g., splice out introns. 

Preferred prokaryotic host organisms may include but are 
not limited to Escherichia coli, Bacillus subtilis, 
Streptomyces lividans, Streptowyces coelicolor. Yeast 
species such as Saccharomyces cerevisiae (baker's yeast), 
Schizosaccharomyces pombe (fission yeast), Pichia pastor'is 
35 and Hansenula polyruorpha (methylotropic yeasts) may also be 
used. Filamentous ascomycetes, such as Neurospora crassa and 
Aspergillus nidulans may also be used. Plant cells such as 



WO 96/34112 



PCIYUS96/06003 



those derived from Nicotiana and Arabidopsis are preferred. 
Preferred mammalian host cells include but are not limited to 
those derived from humans, monkeys and rodents, such as 
Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 293, VERO, 
5 etc (see Kriegler M. in "Gene Transfer and Expression: A 
Laboratory Manual", New York, Freeman & Co. 1990). 

A host organism may be chosen which modifies and 
processes the expressed gene products in a specific fashion 
as desirable. Such modifications {e.g., glycosylat ion) and 

lO-proeeesing (e.g., cleavage) of protein products may be 

important for the function of the protein in a biochemical 
pathway. Different host cells have characteristic and 
specific mechanisms for the post- translat ional processing and 
modification of proteins. Appropriate cells lines or host 

15 systems can be chosen to ensure the correct modification and 
processing of the foreign protein expressed. To this end, 
eukaryotic host cells which possess the cellular machinery 
for proper and accurate processing of the primary transcript, 
glycosylat ion, and phosphorylation of the gene product may be 

20 preferred if the donor organism (s) are eukaryotic. 

For example, it has been shown that eukaryotic fungi 
share much of the same core molecular biology, and that gene 
exchange is possible between many of the most common fungal 
species (Gurr et al . 1987, in Gene Structure in Eukaryotic 

25 Microbes, Kinghorn ed. , p. 93; Bennet & Lasure 1992, Gene 
Manipulations in Fungi, Academic Press, NY). A preferred 
example of a eukaryotic host organism is the fission yeast, 
Schizosaccharomyces pombe . First, the molecular biology of 
S. pombe is highly developed and many major culture and 

30 purification processes and manipulations are routinely 

performed. Second, it is unicellular, and thus can easily be 
cultured, stored, and manipulated in a laboratory setting. 
Third, and of particular importance for use in expressing 
mixed eukaryotic DNAs , it is capable of properly splicing and 

35 expressing genes of other species of fungi, plants, and 
mammals. Studies of the splicing and processing of 
heteronuclear RNA (RNA which contains introns) have indicated 
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that S. pombe shares with other fungi and higher metazoans a 
remarkable similarity of pattern and structure of small 
nuclear RNA (snRNA) components needed for splicing. Finally, 
many non-S. pombe promoters, some of which derive from 
5 mammalian and plant viruses, are capable of driving moderate 
to high levels of gene expression (Forsburg, 1993, Nuc Acids 
Res, 21:2955) This feature can allow the shuttling of a 
fungal DNA/cDNA library to mammalian cell expression hosts 
such as NIH3T3 (fibroblasts), GT1-7 (neuronal), or other cell 
lCL_typ.es- . 

A cloning vector or expression vector may be used to 
introduce donor DNA into a host organism for expression. An 
expression construct is an expression vector containing donor 
DNA sequences operably associated with one or more regulatory 

15 regions. The regulatory regions may be supplied by the donor 
DNA or the vector. A variety of vectors may be used which 
include, but are not limited to, plasmids; cosmids; 
phagemids; artificial chromosomes, such as yeast artificial 
chromosomes (YACs), and bacterial artificial chromosomes 

20 (BACs, Shizuya et al . 1992, Pro Natl Acad Sci 89: 8794-8797) 
or modified viruses, but the vector must be compatible with 
the host organism. Non-limiting examples of useful vectors 
are Agtll, pWE15, SuperCosl (Stratagene) , pDblet (Brun et al . 
1995, Gene, 164:173-177), pBluescript (Stratagene), CDM8 , 

25 pJB8, pYAC3, pYAC4 (see Appendix 5 of Current Protocols in 
Molecular Biology, 1988, Ed. Ausubel et al . , Greene Publish. 
Assoc. & Wiley Interscience , which is incorporated herein by 
reference) . 

When the regulatory regions and transcription factors of 
30 the host and donor organisms are compatible, donor 

transcriptional regions will be able to bind host factors, 
such as RNA polymerase, to effect transcription in the host 
organism. If the donor and host organisms are not 
compatible, regulatory regions compatible to the host 
5 organism may be attached to the donor DNA fragment in order 
to ensure expression of the transferred genes. 
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in cases where the entire operon, including its own 
translation initiation codon, ribosome binding regions, and 
adjacent sequences, is inserted into the appropriate cloning 
or expression vector, no additional control signals may be 
5 needed. However, in cases where only a portion of the coding 
sequence of a gene is inserted, exogenous control signals, 
including the translation initiation codon (frequently ATG) 
and adjacent sequences, must be provided. These exogenous 
regulatory regions and initiation codons can be of a variety 
lOL-of. origins, both natural and synthetic. Both constitutive 
and inducible regulatory regions may be used for expression 
of the donor DNA. It is desirable to use inducible promoters 
when the products of the expression library may be toxic. The 
efficiency of the expression may be enhanced by the inclusion 
15 of appropriate transcription enhancer elements, (see Bittner 
et al. 1987, Methods in Enzymol . 153:516-544). 

"Operably-associated" refers to an association in which 
the regulatory regions and the DNA sequence to be expressed 
are joined and positioned in such a way as to permit 
20 transcription, and ultimately, translation. The precise 

nature of the regulatory regions needed for gene expression 
may vary from organism to organism. Generally, a promoter is 
required which is capable of binding RNA polymerase and 
promoting the transcription of an operably-associated nucleic 
25 acid sequence. Such regulatory regions may include those 5'- 
non-coding sequences involved with initiation of 
transcription and translation, such as the TATA box, capping 
sequence, CAAT sequence, and the like. The non-coding region 
3' to the coding sequence may also be retained or replicated 
30 for its transcriptional termination regulatory sequences, 

such as terminators and polyadenylat ion sites. Two sequences 
of a nucleic acid molecule are said to be "operably- 
associated" when they are associated with each other in a 
manner which either permits both sequences to be transcribed 
35 onto the same RNA transcript, or permits an RNA transcript, 
begun in one sequence to be extended into the second 
sequence. A pol yc i s t roni c transcript may thus be produced. 
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Two or more sequences, such as a promoter and any other 
nucleic acid sequences are operably-associated if 
transcription commencing in the promoter will produce an RNA 
transcript of the operably-associated sequences. m order to 
5 be "operably-associated" it is not necessary that two 
sequences be immediately adjacent to one another. 

In addition, the expression vector may contain 
selectable or screenable marker genes for initially 
isolating, identifying or tracking host organisms that 
10_contain donor DNA. The expression vector may also provide 
unique or conveniently located restriction sites to allow 
severing and/or rearranging portions of the DNA inserts in an 
expression construct. 

The e *P ressi °" vector may contain sequences that permit 
15 maintenance and/or replication of the vector in one or more 
host organism, or integration of the vector into the host 
chromosome. Such sequences may include but are not limited 
to replication origins, autonomously replicating sequences 
(ARS) centromere DNA, and telomere DNA. As a result, one or 
20 more copies of an expression construct may be generated and 
maintained in a host organism. The expression construct may 
be integrated in the host genome or remain episomal in the 
host organism. 

Generally, it may be advantageous to use shuttle vectors 
25 which can be replicated and maintained in at least two host 
organisms, such as, for example, bacteria and mammalian 
cells, bacteria and yeasts, bacteria and plant cells, or gram 
positive and gram negative bacteria. A shuttle vector may 
contain a broad host range replication origin, such as those 
30 derived from IncP. incQ plasmids, or at least two or more 
replication origins. A shuttle vector may also contain 
sequences derived from naturally-occurring plasmids which may 
be used to mobilize the library to various compatible host 
organisms via con 3 ugative transfer (Hayman et al 1993 
35 Plasmid 30: 251-257). B y using a shuttle vector for ' 
constructing a library, the DNA sequences of the donor 
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mav readilv be replicated and transferred from one 
organisms may reaany ^ j-^r 

host organism to another. 

For instance, a preferred and exemplary expression 
vector-host organism combination is the cosmid, SuperCos 1 
5 and the Esherichia coli strain, XLl-Blue MR. both of which 
are commercially available from Stratagene (La Jolla, CA) . 
The vector accepts through a BamHI cloning site DNA inserts 
ranging from 30-42 kbp in size, and carries a neomycin 
resistance marker (neoR) and an SV40 promoter that is used 
10. for- expression in mammalian cell. The vector also contains 
an ampicillin resistance gene for selection in prokaryotic 
cells. The E . coli host organism is deficient in certain 
restriction systems (hsdR, mcrA, mcrCB and mrr) , is 
endonuclease-def icient (endAl) , and recombination deficient 
15 (recA). The host organism cannot cleave inserted DNA 

carrying cytosine and/or adenine methylation, which is often 
present in eukaryotic DNA and cDNA synthesized using methyl- 
dNTP analogs. 

Advantages of this system include the utilization of 
20 highly efficient lambda in vitro packaging systems for 

initially generating a library in restriction minus, recA 
minus, E. coli hosts. Since the quality of source genomic 
DNA may be lower than that is required for naked DNA 
transformations, packaged genomic DNA inserts may be 
25 protected against degradation. Once inside an E . coli host 
cell, damaged inserts may be repaired by the host's cellular 
DNA repair mechanisms. The system requires only small 
amounts of starting genomic DNA (5-10 „g> , and size selection 
may not be required since the packaging system only accepts 
30 inserts in a certain size range. The initial library in E . 

coli may be amplified to produce supercoiled cosmid DNA which 
may be used in high efficiency transformation methods for 
introduction into other expression host organisms. 

The SuperCos 1 vector may be further modified for 
35 clomnp in a Streptomyces host by replacing the SV4 0 origan 
of replication and the neoR gene with the Strepiomyces origin 
of replication (e.g., from the plasmid pIJlOl or P IJS22), and 
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the throstrepton resistance gene. This shuttle vector 
termed Streptocos (see Figure 6A) . is constructed by 
rsolat.on of the 4.0 *b frag m ent from pIJ 699 (Ho p„ood et al 
1985 Genetic Manipulation of Streptcyces. A Laboratory 
5 Manual. The John Innes Foundation, containing the pIJloi 

origin and the thiostrepton resistance gene by digestion with 
Kpnx and „i ndII1 . This fragment is blunted at the Kpnl site 
and cloned into SuperCos at the *»I-„indIII restriction 

n „ r te l <£ee Bierm " ^ ' DSniS 1992 f ° r relaced samples, 
10_ln addition, sequence elements may be introduced for shuttle 
cosmid mobilization via con jug ative transfer (Bierman et al 
1992. Gene 416:43-49,. Different Streptocos versions 
containing SC repto my ces-specif ic promoters may be introduced 

PCR, Strepto my ces promoter fragments may be generated that 
can be directionally cloned into the Notl/EcoRI sites of 

!s P r C ° S /,. * Varlety ° f kn °" n St ™Pto m yces promoters may be 
used including ermE, Pptr (1995, Mol Microbiol, 17-969) and 
hrdB (Buttner, M.J. 1989, Mol Microbiol, Vol. 3, pp. 1653 - 
1659) . This approach can generally be useful for a wide 
range of host-vector systems. Accordingly, SuperCos 1 m ay be 
modified by introduction of host replication origins 
selectable marker genes, and homologous promoters if 
necessary . 

25 For combinatorial gene expression libraries using plant 

cells as hosts, the expression of the donor coding sequence 
may be driven by any of a number of promoters. For example 
preferred strains are described in Principles of Gene 
Manipulation 1985, R.w. OLD and S.B. Primrose 3rd ed 

30 Blackwell Scientific Pub.; Vectors: A survey of molecular 
clonxng vectors and their uses 1988, R.L. Rodriguez D T 
Denhardt. Butterworths Pub.,- A Practical guide to molecular 
Clonxng 19 88, B. Perbal . John Wiley and Sons, viral promoters 
such as the 35S RNA and 19S RNA promoters of CaMV (Brisso^ et 

35 al. 1S 84. Nature 310:511-514), or the coat, protein promote, 
of TMV (Takamatsu et al . 1967, EMBO J. 6:307-311) may be 
used; alternatively, plant promoters such as the small 
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subunit of RuBISCo (Coruzzi et al . 1984, EMBO J. 3:1671-1680; 
Broglie et al . 1984, Science 224:838-843); or heat shock 
promoters, e.g., soybean hspl7.5-E or hs P 17.3-B (Gurley et 
al. 1986, Mol. Cell. Biol. 6:559-565) may be used. 
5 Both plant cells and protoplasts may be used as host 

cells. Plant hosts may include, but are not limited to, 
those of maize, wheat, rice, soybean, tomato, tobacco, 
carrots, peanut, potato, sugar beets, sunflower, yam, 
Arabidopsis, rape seed, and petunia. Plant protoplasts are 
10- pre-f i^rred because of the absence of a cell wall, and their 
potential to proliferate as cell cultures, and to regenerate 
into a plant. 

In addition, the recombinant constructs may comprise 
plant -expressible selectable or screenable marker genes which 
15 include, but are not limited to, genes that confer antibiotic 
resistances, {e.g., resistance to kanamycin or hygromycin) or 
herbicide resistance (e.g., resistance to sulfonylurea, 
phosphinothricin, or glyphosate) . Screenable markers 
include, but are not be limited to, genes encoding S- 
20 glucuronidase (Jefferson, 1987, Plant Molec Biol. Rep 5:387- 
405), luciferase (Ow et al . 1986, Science 234:856-859), and B 
protein that regulates anthocyanin pigment production (Goff 
et al. 1990, EMBO J 9:2517-2522). 

To introduce donor organism DNA into plant cells, the 
25 Agrobacterium tumefaciens system for transforming plants may 
be used. The proper design and construction of such T- DNA 
based transformation vectors are well known to those skilled 
in the art. Such transformations preferably use binary 
Agrobacterium T-DNA vectors (Bevan, 1984, Nuc . Acid Res. 
30 12:8711-8721), and the co-cultivation procedure (Horsch et 
al. 1985, Science 227:122 9-1231). Generally, the 
Agrobacterium transformation system is used to engineer 
dicotyledonous plants (Bevan et al . 1982, Ann. Rev. Genet 
16:357-384; Rogers et al . 1986, Methods Enzymol . 118:627- 
35 641), but it may also be used to transform as well as 

transfer DNA to monocotyl edonous plants and plant cells, 
.(see Hernalsteen et al . 1984, EMBO J 3:3039-3041 ; Hooykass- 
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Van Slogteren et al . 1984. Nature 311:763-764; Grimsley et 
al. 1987. Nature 325:1677-1679; Boulton et al . 1989, Plant 
Mol. Biol. 12:31-40.; Gould et al . 1991, Plant Physiol. 
95 .-426-434) . 

5 In other embodiments, various alternative methods for 

introducing recombinant nucleic acid constructs into plant 
cells may also be utilized. These other methods are 
particularly useful where the target is a monocotyledonous 
plant cell. Alternative gene transfer and transformation 

10_me.thpas. include, but are not limited to. protoplast 

transformation through calcium-, polyethylene glycol (PEG) - 
or electroporation-mediated uptake of naked DNA {see 
Paszkowski et al . , 1984, EMBO J 3:2717-2722, Potrykus et al . 
1985, Molec. Gen. Genet. 199:169-177; Fromra et al . , 1985, 

15 Proc. Nat. Acad. Sci. USA 82:5824-5828; Shimamoto, 1989 
Nature 338:274-276) and electroporat ion of plant tissues 
(D'Halluin et al . , 1992, Plant Cell 4:1495-1505). Additional 
methods for plant cell transformation include microinjection, 
silicon carbide mediated DNA uptake (Kaeppler et al . , 1990, 

20 Plant Cell Reporter 9:415-418), and microproj ectile 

bombardment (see Klein et al . , 1988, Proc. Nat. Acad. Sci. 
USA 85:4305-4309; Gordon-Kamm et al . , 1990. Plant Cell 2-603- 
618) . 

For general reviews of plant molecular biology 
25 techniques see, for example, Weissbach & Weissbach. 1988, 
Methods for Plant Molecular Biology, Academic Press, NY,' 
Section VIII, pp. 421-463; and Grierson & Corey, 1988, Plant 
Molecular Biology, 2d Ed., Blackie, London, Ch. 7-9. 

In an insect system, Autographa californica nuclear 
30 polyhydrosis virus (AcNPV) a baculovirus, is used as a vector 
to express donor genes in Spodoptera frugiperda cells. The 
donor DNA sequence may be cloned into non-essential regions 
(for example the polyhedrin gene) of the varus and placed 
under control of an AcNPV promoter (for example the 
3 5 polyhedrin promoter) . These recombinant viruses are then 
used to infect host cells in which the inserted gene as 
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expressed. (e.g.. see Smith at al. 1963, a Virol 46:584; 
smith U S. Patent No. 4,215.051). 

in yeast, a number of vectors containing constitutive or 
inducible promoters may be used with Saccharoses cerevisiae 
S (baker's yeast,. Schizosaccharomyces pombe (fission yeast,. 
Pichia pastoris, and Hansenula polyn-orpha (methylotropic 
'easts, For a review see. Current Protocols in Molecular 
Biology, vol. 2. 1568. Ed. Ausubel et al., Greene PubUsh. 
Assoc. t Wiley Interscience, Ch. 13; Grant et al . 1967, 
.^Expression and Secretion Vectors for Yeast, in Methods in 

Enzymdogy, Eds. Wu * Gross.au. 1587, Acad. Press. H . Y . Vol . 
153 516-544; Glover, 1966, DNA Cloning, Vol. II. 

Press, wash., D.C.. Ch. 3; and Bitter. 1987, Heterologous 
Gene Expression in Yeast, Methods in Enzymology, Eds. Berger 
1S I Kimmel. Acad. Press. N.Y., Vol. 152. pp. 673-684; and The 
Molecular Biology of the Yeast Saccharoses, 1982. Eds 
Strathern et al . , Cold Spring Harbor Press, Vols. I and II. 

in mammalian host cells, a variety of mammalian 
expression vectors are commercrally available. In addition. 
2 0 a number of viral-based expression systems may be utilized, 
in cases where an adenovirus is used as an expression 
Che donor DNA sequence may be Irgated to an adenovirus 
transcription/translation control complex, e.g.. the late 
promoter and tripartite leader sequence. This chimerrc gene 
25 m ay then be inserted in the adenovirus genome by in vitro or 
in vivo recombination. Insertion in a non-essentral region 
of the vira! genome (e.g.. region El or E3, will result rn a 
recombinant virus that is viable and capable of expressing 
heterologous products in infected hosts. .e.g.. see « Logan * 
30 Shenx, 1984, Proc . Nat!. Acad. Sci. (USA, 81 : 3655-3659, The 
Epstein-Barr virus (EBV) origin (OriP, and EBNA-1 as a trans- 
acting replication factor has been used to create shuttle 
eoisomal cloning vectors, e.g., EBO- P CD (Spickofsxy et al . 
a990 , DNA Prot Eng Tech 2:14-18,. Viral vectors based on 
35 retroviruses may also be used (Morgenstern et al . 1969 Ann 
R „v Neurosci, 12:47-65,. Alternatively, the vaccinia 
promoter may be used. (See, e.g., Mackett et al . 1962, Proc. 
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Natl. Acad. Sci. (USA) 79:7415-7419,- Mackett et al . 1984 j 
Virol. 49:657-864; Panicali et al . 19 82. Proc. Natl. Acad ' 
Sci. 79:4927-4931) . 

5 cell/ " U ^". 0f SeleCCi °" sy^™ ~y be used for mammalian 
5 cells including but not lifted to the Her?es . ° 

thymine kinase (wigler, et al . 1977. Cell n. 223 , 
hypoxanthine-guanine phosphoribosyltransf erase (Szybalska « 
Szybalsk! 1962, Proc. Natl. Acad. Sci. USA 46:202!,, and 
XO. lt 27. Ph ° S P h - ib -^trans f erase (bowy. et al . i960. Cell 
■ aa.22,^7, genes can be empioyed in tk', hgprf or aprt' cells 

respectively. Also, antimetabolite resistance can be used as 
the basis of selection for dihydrof olate reductase (dhfj 
which confers resistance to methotrexate (wigler, et al ' 
I960. Natl. Acad. sci. USA 77:3567; cVHare. et al. 
15 Proc. Natl. Acad. Sci. USA 78:1527); gp t , which con£ J ' 
resistance to mycophenolic acid (Muiligan & B erg 1981) 
Proc. Natl. Acad. Sci. USA 76:2072); neomycin 
Phosphotransferase (neo) , which confers resistance to the 
aminoglycoside G-418 (Colberre-Garapin . et al 1981 J „ n1 
« Biol. 150:1); and hygromycin phosphotransferase (hyg, which 
confers resistance to hygromycin (Santerre, et al.^84 all 

The present invention also provides specific 

25 oTtheTr 3 ^ h ° St ° r9aniSmS thaC imPr ° Ve thS e-formance 
of the combinatonai gene expression libraries. when the 

libraries are used for the purpose of generating secondary 
metabolites, the toxicity of the compounds can lead to under- 
representation of these productive host organisms in the 
library. In one embodiment of the invention, the host 
organism may be modified so that the growth and survivai of 
the host organism is less adversely affected by the 
production of compounds of interest. The increased tolerance 
can reduce the loss of host organisms that are producing 
35 ZHT " " e 3 -age as weU as the production 

One preferred modification of the host oruanism is the 
introduction into and/or over-production of active drug 
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efflux systems in the host organism. Membrane-associated 
energy driven efflux plays a major role in drug resistance in 
Tost organisms, including bacteria, yeasts^ an -mmalran 
cells (Nikaido 1994. Science 264:382-369.. Bait e zl. 1994^ 
5 Biochim Biophys Acta 1187:192-162,- Gottesman et .1 . 1993. Ann 
R ev Biochem 62:385, . A modified host organism having an 
enhanced complement of efflux systems can actively secrete a 
b roader range of potentially toxic compounds, thus reducing 
their accumulation inside the host organism: Negative 
ia feedback mechanisms, such as end-product inhibition of the 
metabolic pathway producing the compounds, may be avoided. 
Moreover, the isolation of the compounds may be made more 
efficient since the compounds of interest do not accumulate 
inside the host organisms. 
1S in bacteria, a large number of efflux systems have been 

studied which can pump out a wide variety of structurally 
unrelated molecules ranging from, for example, polyket.de 
antibiotics (acrAE genes of B. coll. Ma et al. 1993 J 
Bacterid 175:6299-6313). f luroquinol ines and ethidium 
20 bromide (bmr of Bacillus subtilis and nor A of Staphylococcus 
aureus, Neyfakh et al . 1993, Antimicrob Agents Chemother 
37128-129), doxorubicin (drr of streptomyces peucetius, to 
quaternary amines (qacE of Klebsiella aerogenes and mvrC of 
I coli) see Table III for a list of non-limiting examples 
25 of efflux systems. Any such efflux systems may be used in a 
prokaryotic host organism. 

in yeast, many genes conferring pleiotropxc drug 
resistance encode efflux systems, and may be useful in the 
present invention. For example, the bfrl. gene confers 
30 brefeldin A resistance to Schizosaccharomyces pombe. and 
CDR1 gene of Candida albicans confers resistance to 
cyclohexamide and chloramphenicol (Prasad et al. 1995, Curr 

Genet 27:320-329) . 

r or mammalian cells, the multidrug resistance pro.ems 
3S which belong to the ciass of ATP-binding pump protein may be 
used (Juranka et al . 1989, FASEB J. 3:2 563-2592: Paulusma et 
al 1 99 6, Science 271:1126-1128; Zaman et al. 1994, Proc . 
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Table III: 

List of compounds that are secreted by 
active drug efflux systems 



chemical class 



cationic dyes 

' basic "antibiotics 

hydrophilic antibiotics 
hydrophobic antibiotics 

5 

organic cation 

uncharged 

0 weak acid 

zwitterions 
detergent 



specific name 



rhomadamine - 6G 
ethidium bromide 

acrif lavine 

puromycin 
doxorubicin 

novobiocin 
macrolide 

beta-lactams 

tetraphenyl 
phosphonium 

taxol 

chloramphenicol 

nalidixic acid 
mithramycin 

f luoroquinolines 

SDS 



efflux systems 
bmr 



acrAE 



bmr 

drr, mdr 



acrAE 



mdr 
bmr 



emr 
mdr 



bmr 
acrAE 



30 



35 
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One or more efflux systems may be introduced, induced or 
overproduced into a host organism. The genes encoding 
components of an efflux system may be introduced into in a 
host organism and expressed using the expression vectors and 
5 techniques described above. In some instances, it may be 
advantageous to use an inducible promoter for expression of 
the efflux system genes. 

5.1.4. COMBINATORIAL NATURAL PATHWAY EXPRESSION 
L . . _ LIBRARIES 

The present invention relates to the construction and 
uses of combinatorial gene expression libraries, wherein the 
host organisms contain genetic material encoding natural 
biochemical pathways or portions thereof that is derived from 
a plurality of species of donor organisms, and are capable of 
producing functional gene products of the donor organisms. 
Biochemical pathways or portions thereof of the donor 
organisms are thus functionally reconstituted in individual 
host organisms of a library. Novel activities and compounds 
of such biochemical pathways may be more accessible to 
screening by traditional drug discovery techniques or by 
methods provided herein. 

Either DNA or RNA may be used as starting genetic 
material for preparing such libraries which may include cDNA 
libraries, genomic DNA libraries, as well as mixed 
cDNA/genomic DNA libraries. DNA fragments derived from a 
plurality of donor organisms, e.g., organisms described in 
Section 5.1.1. are introduced into a pool of host organisms, 
such that each host organism in the pool contains a DNA 
fragment derived from one of the donor organisms. 

It may be advantageous if the host organism and the 
donor organisms share certain genetic features, such as 
similar GC content of DNA and common RNA splicing mechanisms, 
or physiological features, such as optimal growth 
temperature. It may thus be desirable to use a host organism 
that is phylogenet ically closely related to the donor 
organisms. For instance, a prokaryotic host organism may be 
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m ore desirable for cloning and expression of operons of other 
prokaryotes . 

Donor organisms that are not amenable to traditional 
drug discovery or drug development technologies may be 
5 preferred. For example, most marine bacteria are poorly 
characterized and not amenable to conventional terrestrial 
microbiology protocols. The present invention can simplify 
the development of production and purification processes. 
The fragment of donor DNA that is transferred may 
. 10- comprise coding regions encoding functional proteins of a 

complete biochemical pathway or portions thereof, as well as 
natively associated regulatory regions such as promoters and 
terminators. Optimal results may be obtained by using large 
prokaryotic genomic DNA fragments which have a greater 
15 probability of encoding an entire biochemical pathway. If 
the native function and organization of the transferred DNA 
fragment is maintained in the host organism, the genes of the 
donor organism may be coordinately expressed. Also provided 
are exogenous regulatory regions that may be attached to the 
20 DNA fragments so as to ensure transcription of the 

transferred genes in the host organism, thereby replacing or 
supplementing transcription initiated from the native 
promoters . 

interestingly, many of the genes derived from marine 
25 bacteria have been found to utilize the native promoters to 
express functional proteins in E. coli. Thus, genes of 
marine microorganisms may be expressed even without the need 
to use exogenous regulatory regions. An exemplary list of 
marine bacterial genes that uses its native promoter in E . 
30 coli is provided in Table IV. 
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List of marine bacterial genes that use 
in E.coli 



Gene (s) 

kappa - carrageenase 
(cgkA) 



Na+/H+ antiporter 
(NhaA) 



phosphodiesterase 
(cpdP) 



chit inase 



tributyl tin 



Genus & Species 

Al teromonas 
carrageenovora , 
gram ( - ) aerobe 

Vibrio 

alginolyticus 



Vibrio f i setter i, 
symbiont 



Alt eromona s sp . 
Strain 0-7 



Al t eromona s sp . 



chloride resistance M-l, gram ( - ) rod 



dagA- complementing 
vibriolysin (nprV) 



tetracycline 
resistance 



melanin synthesis 
(melA) 



30 DNA modification 
cluster 



Al teromonas 
haloplanktis , 
gram ( - ) 

Vibrio 

proteolyticus , 
gram(-) 

Vibrio salmonicida, 
aerobe 

Shewanella 
colwelliana, 
gram(-) periphyte 

Hyphomonas 
jannaschiana , 
thermophile 



its native promoter 

Reference 

Barbeyron et al . , 
1994, Gene 
139:105-109 

Nakamura et al . , 
1994, Biochim 
Biophys Acta 
1190 :465-468 

Dunlap et al . , 
1993, J . Bact. 
175 (15) .-4615-4624 
Tsuj ibo et al . , 
1993, J. Bact. 
175 (1) : 176-181 

Fukagawa et al . , 
1993, Biochem. 
Biophys. Res. 
Comm . 

194 (2) :733-740 

MacLeod et al., 
1992, Mol. Micro. 
6 (18) : 2673-2681 

David et al . , 
1992, Gene 
112:107-112 

Sorum et al . , 
1992, Chemo. 
36 (3) :611-615 

Fuqua et al . , 
1991, Gene 
109 : 131-136 

Danaher et al . , 
1990, Gene 
89 : 129-133 
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are also knovm to be clustered. Gene 
produce Che can be activated and produced 

LsynthetrC pathways £ ^Hre aiso 

r^riMLrec^cLfcer, — r:r 

20 used as donor organise, it i. likely that genes chat are 
functionally related in a orosynthetrc pathway would he 
i c ni arpd in one clone. 

1 "J organ^s ha.rng compact geno.es chat contain 

reiatrveiy non-codrng regions « ^ ^ 

2 S aspects, the donor organr .s •« in length for 

r^rLr^OO^r^o-chaehacterra T he nu^er o £ 
.dependent clones reguired in a ^^^^ ^ 

30 rerri^irarerrrrLi-ng ^ ^ et 

al . 1976, Cell 9 : 91-99) : 



Where 



WO 96/34112 



PCT/US96/06003 



•library °' necessary in the 

P - the probab iliCy a seguence Js repres£nted 

the fractional proportion of the g eno me in a 

single recombinant clone 

F onowin 3 these ca c! a t o s'tle 031 m : tely " °' »■ 

*■ «Ji Thu! - 3 9en0me si " sim "« to 

Manias Ht,^ 
» S P r ing Harbor P^"-^^"^.^^' ^ 

out r o U tLer:::LTbio N ™ao:L y nr £ T owed to c - ry 

the oo.binatorial g ene exprelll LbrLIL " ~°~™»» 

~ : ~ tl. 

a donor or g an ism can potentially J ^ £ ™ 

occurring plasmids, may b~ used m«- 

d_ used. Alternatively RNA of a 
donor organism may be used RNA n, P f „ 

(mRNA) m3v k Preferably messenger RNA 

«Zl ext «cted. purified and converted to 

for priming fi^st s - a nH - • * *— n,ers ma Y be used 

y ii.sc s,rana syntnesis of cDKP n^a 
optionally he a mp l i£led bv poIym . rase ° t "V 

. ^.ym.rase cr, ai n reaction (PCR) 



reaction (PCR) . 
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Genomic DNA and RNA may be extracted and purified by the 
procedure provided in Section 5.1.2 or by those that are 
known in the art. For filamentous fungi and bacteria, such 
procedures may comprise any of several techniques including 
5 a) rapid SDS/high salt lysis of protoplasts prepared from 
young wycelia grown in liquid culture and immediate 
extraction with equilibrated phenol; b) rapid lysis of 
protoplasts in guanidinium isothiocyanate followed by 
ultracentrifugation in a CsCl gradient; or c) isolation of 
10„high_jnolecular weight DNA from protoplasts prepared in 
agarose plugs and pulsed field gel electrophoresis. For 
bacteria, an alternative procedure of lysis by 
lysozyme/detergent, incubation with a non-specific protease, 
followed by a series of phenol /chlorof orm/isoamyl alcohol 
15 extractions may be useful. 

For optimal results, large random prokaryotic genomic 
DNA fragments are preferred for the higher probability of 
containing a complete operon or substantial portions thereof. 
The genomic DNA may be cleaved at specific sites using 
20 various restriction enzymes. Random large DNA fragments 

(greater than 20 kbp) may be generated by subjecting genomic 
DNA to partial digestion with a frequent-cutting restriction 
enzyme. The amount of genomic DNA required varies depending 
on the complexity of the genome being used. Alternatively, 
25 the DNA may be physically sheared, as for example, by passage 
through a fine-bore needle, or sonication. 

Prior to insertion into a vacant expression vector, such 
DNA inserts may be separated according to size by standard 
techniques, including but not limited to, agarose gel 
30 electrophoresis, dynamic density gradient centrif ugation, and 
column chromatography. A linear 10-40% sucrose gradient is 
preferred. The insertion can be accomplished by ligating the 
DNA fragment into an expression vector which has 
complementary cohesive termini. The amounts of vector DNA 
35 and DNA inserts used in a ligation reaction is dependent on 
their relative sizes, and may be determined empirically by 
techniques known in the art. However, if the complementary 
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restriction sites used to fragment the DNA are not present in 
the expression vector, the ends of the DNA molecules may be 
enzymatically modified, as for example, to create blunt ends. 
Alternatively, any site desired may be produced by ligating 
5 nucleotide sequences i.e., linkers or adaptors, onto the DNA 
termini; these ligated linkers or adaptors may comprise 
specific chemically-synthesized oligonucleotides encoding 
restriction endonuclease recognition sequences. In an 
alternative method, the cleaved expression vector and DNA 
1(L inserts may be modified by homopolymeric tailing. 

After ligation of vector DNA to DNA inserts, the 
expression constructs are introduced into the host organisms. 
A variety of methods may be used, which include but are not 
limited to, transformation, transf ection , infection, 
15 conjugation, protoplast fusion, liposome-mediated transfer, 
electroporation, microinjection and microprojectile 
bombardment. In specific embodiments, the introduction of 
bacteriophage or cosmid DNA into an E. coli host is carried 
out by in vitro packaging the DNA into bacteriophage 
20 particles then allowing these particles to infect E. coli 

cells. Other naturally-occurring mechanisms of DNA transfer 
between microorganisms may also be used, e.g., bacterial 
conjugation . 

After the host cells containing expression constructs 
25 are pooled to form a library, they can be amplified and/or 
replicated by techniques known in the art. The purpose of 
amplification is to provide a library that can be used many 
times. Amplification may be achieved by plating out the 
library, allowing the bacteria to grow, and harvesting the 
30 phage or bacteria for storage. 

Alternatively, the library may be stored in an ordered 
array. The bulk of the library can be plated out at low 
density to allow formation of single, discrete plaques or 
colonies, followed by transfer of individual plaques or 
35 colonies into the wells of coded multi-well master places, 
e.g., 96-well plates or 384-well plates. The individual 
clones are allowed to grow in the wells under the appropriate 
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conditions. The coded master plates can be used as an 
archival source to replicate each clone separately into one 
or more working plates. Thus, each clone in the library may 
be handled and assayed individually. The coded archival 
5 plates may be sealed and stored for future use. Replication 
and transfer of the clones may be done with a multi-pin 
replicator, or multi-channel devices for fluid handling. 
Preferably, all or most of the transfers and manipulations 
are performed by laboratory robots (Bentley et al . 1992, 
. id Genomics 12:534-541). 

The libraries of the invention may be preserved by 
lyophilization, or cryopreservation in a freezer (at -20°C to 
-100°C) or under liquid nitrogen (-176°C to -196°C). 

Host organisms containing donor DNA in a library may be 
15 identified and selected by a variety of methods depending on 
the host-vector system used. In one approach, such host 
organisms are identified and selected upon the presence or 
absence of marker gene functions, e.g., thymidine kinase 
activity, resistance to antibiotics, such as kanamycin, 
20 ampicillin, bleomycin, or thiostrepton . production of 

pigment, such as melanin, and resistance to methotrexate. 
Alternatively, a change in phenotype or metabolism of the 
host organism, indicated by metabolic testing, foci formation 
in tissue culture, or occlusion body formation in baculovxrus 
25 may be used. Once selected for the presence of donor DNA, a 
series of enzymatic assays or metabolic tests may be carried 
out on the clones for further characterization. 

To characterize the donor DNA inserts in a library of 
clones containing donor DNA or a portion thereof, mini 
30 preparations of DNA and restriction analysis may be performed 
with a representative set of clones. The results will 
provide a fingerprint of donor DNA size and restriction 
patterns that can be compared to the range and extent of 
insert DNA which is expected of the library. 
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5 - 1 - 5 - COMBINATORIA L CHIMERI C PATHWAY EXPRESSION LIBRARTPQ 
The present invention also relates to the construction 
and uses of combinatorial chimeric pathway expression 
libraries, wherein the host organisms contain randomly 
5 concatenated genetic materials that are derived from one or 
more species of donor organisms, and are capable of producing 
functional gene products of the donor organisms. A 
substantial number of host organisms in the library may 
contain a random and unique combination of genes derived from 
10-one-or more species of donor organism (s) . Coexpression of 
the transferred genes may be effected by their respective 
native regulatory regions or by exogenously supplied 
regulatory regions. The plurality of gene products derived 
from the different donor organisms interact in the host 
15 organism to generate novel chimeric metabolic pathways and 
novel compounds. Novel activities and compounds of such 
chimeric pathways may become more accessible to screening by 
traditional drug discovery techniques or by methods provided 
herein. 

20 While not limited to any theory of how novel pathways or 

compounds are generated in a combinatorial chimeric pathway 
gene expression library, the coexpression of functional 
heterologous genes derived from one or a plurality of species 
of donor organisms enables the gene products to interact in 
25 vivo with each other, and with elements of the host organism 
Through such interactions, new sets of biochemical reactions 
will arise, some of which can act in concert to form a 
chimeric biochemical pathway. The heterologous gene produces 
may encounter substrates, cofactors and signalling molecules 
30 that are not present in their respective donor organism. 
Such substrates, cofactors and signalling molecules may be 
supplied by the host organism, by other heterologous gene 
products that are coexpressing in the same host organism, or 
rrom the medium. 

35 Moreover, some of the heterologous gene products may be 

modified structurally, and compartmentalized or localized 
differently during biosynthesis in the host organism.. Some 
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of the heterologous gene products may be exposed to a host 
cellular environment that is different from that of their 
respective donors. 

It is envisioned that some heterologous gene products 
5 may also act on the host organism and modify the host 
cellular environment. Elements of the host cellular 
environment that may affect, or be affected by, the function 
of heterologous gene products may include but are not limited 
to concentrations of salts, trace elements, nutrients, 
10- oxygen,- metabolites, energy sources, redox states, and pH. 
Some heterologous gene products may also interact with host 
gene products which can result in the modification of the 
host's metabolic pathways. 

Depending on the combination of heterologous genes, 
15 novel chimeric biochemical pathways and novel classes of 
compounds that do not exist in nature may be formed in the 
host organisms of the library. In combinatorial chimeric 
pathway expression libraries, the genetic resources of the 
donor organisms are multiplied and expanded to provide a 
20 diversity of chemical structure that may not be found in 
individual organisms. The libraries so prepared may be 
screened using traditional methods or methods provided by the 
present invention. Thus, the novel pathways and compounds 
are made more accessible to drug screening. 
25 Any of the donor organisms described in Section 5.1.1 

may be used in preparing a combinatorial chimeric pathway 
expression library. Donor organisms may be selected on the 
basis of their known biological properties, or they may be a 
mixture of known and/or unidentified organisms. 
30 The combinatorial chimeric pathway expression libraries 

of the invention may be assembled according to the principles 
described in section 5.1.3. In order to allow the random 
concatenation of DNA fragments from multiple species of donor 
organisms, the procedure for library assembly may be modified 
35 by including the following steps: generation of smaller 
genomic DNA fragments, ligation with regulatory sequences 
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such as promoters and terminators to form gene cassettes, and 
concatenation of the gene cassettes. 

Insert DNAs may be complementary DNA (cDNA) derived from 
mRNA, and/or fragments of genomic DNA. The DNA or RNA of 
5 different species of donor organisms may be copurified, or 
they may be isolated separately and then combined in specific 
proportions. The random mixing of insert DNAs can be done at 
any stage prior to insertion into the cloning or expression 
vector . 

.10-- Methylated nucleotides, e.g., 5 -methyl -dCTP, may be used 

in cDNA synthesis to provide protection against enzymatic 
cleavage, and allow directional cloning of the cDNA inserts 
in the sense orientation relative to the promoter and 
terminator fragments. 

15 Random fragments of genomic DNA in the range of 2-7 kbp 

may be generated by partial digestion with a restriction 
enzyme having a relatively high frequency of cutting sites, 
e.g., Sau3AI. Partial digestion is monitored and confirmed 
by subjecting aliquots of the samples to agarose gel 

20 electrophoresis. 

Exogenous regulatory regions, such as constitutive or 
inducible promoters and terminators may be provided to drive 
expression of the transferred genes. When the host and donor 
expression systems are not compatible, it is essential to 

25 provide such regulatory sequences. PCR may be used to 

generate various promoter and terminator fragments that are 
specific to a particular expression host, and have defined 
restriction sites on their termini. Any method for 
attachment of a regulatory region to the DNA inserts may be 

30 used. Treatment with the Klenow fragment and a partial set 
of nucleotides, i.e., a partial fill-in reaction, may be used 
to create insert DNA fragments which will only ligate 
specifically to promoter and terminator fragments with 
compatible ends. 

35 The present invention provides a method involving the 

use of gene cassettes which contains two copies of a 
promoter, oppositely positioned on either side of a unique 
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• ,•„„ site Any DNA inserted into this restriction 
restriction sate. Any ^ ^ promoters 

site will be transcribed on botn s 

^Hvelv from both sides. 
reSPe r h present invention also provi.es an alternative 

Tne pres. cassettes which contain a 

5 m ethod involving the use of gene side o£ . DNA 

t :rtLT"e z^:z:^ - 

^ed/the /enas and ends "J^J^^? 
haV e unigue "•"^.'^ 
.ia. the- -promoter fragments and the 5 en 

fragments respectively. mT ^r-ible 

,„ 

^-ize of approximately 1-10 kbp. 
15 cassettes havxng a mean sxze off transcription units are 
Concatemers comprising multiple transcrip 

i e j m n»r to that used in peptide 

de proteoti The so lid phase allows separation of 

transcription units. The soli P 

th e concate.ers fro. the unligated D»A fragments a ter 
addition eycle. When concatenation is ^^ ^on 
2 S — s are released hy e v a unigue and 

enzyme, such as an intron nuclease, that el 

ver 7rare site accent to the solid phase to redt ^ 
pr ohahility of cleaving the concatenated DNA^ Con 

.ay then he inserted into a cloning vec or to for. 
30 expression constructs which - — ^ ^ 

appropriate host organisms. Alterna ^ ain for 

• - ar , p coli recA minus strain .01 
may be transformed into an E organi sms. 

35 fragments, the preparation of g-n- ^ 



insert and vector, 

the DNA inserts, and tne i^y--" " 
provided in Sections 5.4 and 5.5. 
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Once the combinatorial chimeric pathway expression 
^brary is assembled. it be , • 

replicated, pre-screened and screened essent ially ' in the s 
manner as described in section S.1.3. 



5 

5.1.6 



Spools, .together from one or m spec's of InT 

instead of using only the total tJTT organisms, 
the donor organism J thi 9en ° mlC " ° r CDNA ° f 

of clones th ' a PP roach -i" reduce the number 

clones that need to be screened and increase the 

related to or are involved in producing compounds of 
20 interest. The initial DNA library, preferably a cosmid 
library and not necessarily an expression library 12 

further pre- screening . if the initial library is an 

2s ::rr 10 " ubrary ' dna in the «™> be 

s T ^ eXPreSSEd ^ 3 h ° St f - PrcductLn su 

s E. coir or Streptomyces lividans. More than one initial 
library may be pre-screened. and DNA from all the po 
c ones can be pooled and used for maxing the biased 
combinatorial gene expression library 
30 The initial library may be amplified so that DNA of the 

donor organisms can be pre-screened in a variety of hos 
o ganrsms. Por example, once a gene expression libra" in 
s ecTaT:re C ; S h Jda " £ " 9ener " ed - U * — nto 

-. ri„csis .hat produces oxyt et racycl i ne . or s. parju= 
tnat proouces actinomycin D . If the expression vector 
contains the appropriate seguences for genetic transfer -n 
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naturally occurring plasmids, such sequences may be used to 
mobilize the library to various compatible host via 
conjugative transfer. 

The probes used for pre - screening may be derived from 
5 any cloned biosynthetic pathway, such as the polyketide 
biosynthetic loci, as these are the best characterized 
biosynthetic loci and there is considerable sequence 
conservation between the known clusters, e.g., actl 
(actinorhodin biosynthesis - Malpartida et al . 1987 Nature 
10.32-5-^818-820), whiE (spore pigment biosynthesis - Blanco et 
al. 1993 Gene 130:107-16) and eryAl (Donadio et al . 1991 
Science 252, 675-679). Similar principles may be applied to 
other antibiotic or secondary metabolite biosynthetic loci. 
For example the cloned peptide synthetase genes in low-GC 
15 gram positive bacteria, such as Bacillus (Stachelhaus et al . 
1995 Science 269: 69-72) and in high-GC gram positive 
bacteria, such as actinomycetes species. that produce 
thiostrepton, virginiamycin , valinomycin and actinomycin, may 
have enough sequence similarities to be used as probes to 
20 identify new biosynthetic loci in both groups of bacteria. 
Other cloned biosynthetic pathway, such as peptide synthases 
and aminoglycoside synthases, can also provide probes for 
pre-screening the initial libraries. 

Alternatively, the initial DNA library may be screened 
25 by probes derived from DNA that encode proteins involved in 
secondary metabolism. Such probes may be prepared by 
subtracting non-coding DNA and DNA encoding proteins that 
relate to primary metabolism biosynthetic pathways from total 
DNA. The remaining DNA is thus biased toward coding regions 
30 that encode proteins involved in secondary metabolism. 

Details of the subtraction procedure are provided in Section 



5.3.5. 



5. 2 . SCREENING COMBINATORIA L EXPRE SSION T.TBRARIES 
35 The drug discovery system of the present invention 

funher encompasses novel methods for screening combinatorial 
expression libraries. While standard methods of screening 
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expression libraries, such as antibody binding and ligand 
- . binding, can also be used with expression libraries of the 

present invention, the libraries can be adapted to a reporter 
regimen tailored to identify host organisms that are 
5 expressing the desirable pathways and metabolic products 
The methods claimed herein enables the management of 
large sample numbers with minimal handling to permit 
efficient and high- throughput detection and isolation of 
productive clones in the library. The libraries may be pre 

■ - ofTcl br ° ad ° £ ""^"«. *~ the production 

of a class of compounds or for the presence of relevant DNA 
sequences. The libraries may also be used directly with a 

nsolated" t ^ ^ ^ 

rsolated population of cells may readily be cultured 

t 7 ande ri d ^ — to further analysis ror 

the production of novel compounds. The genes encoding the 
metabolic pathway that lead to production of the novel 
activity or compound may be delineated by charactering the 
genetic material that was introduced into the isolated 
clones. information on the genes and the pathway, and the 
clones, will greatly facilitate drug ^ y 
production. 

cel] T T d herei "' " library Cl ° nes " ° r ""brary 

2S " C ° h ° SC CEllS ° r nanisms in a combinatorial 

gene expression library that contain at least one fragment of 
donor DNA that may encode a donor metabolic pathway or a 

c™ e :! f there0£ ; "*» "P-i"- Clones- or -positive 

cells refers to library clones or cells that produce a 
Signal by virtue of the reporter regimen. The term 
productive cells- or "productive clones" refers to host 

cono S ^ °" 9aniSmS in the libr "y '"at Produce an activity or 
compound of interest, in distinction from the remainder ..non- 
productive cells- in the library. 

The term -pre-screen- refers to a general biologies! or 
35 oiochemical assay which indicates the presence of an 
activity, a compound or a gene of interest. Th- term 
-screen- refers to a specific therapy-oriented biological o- 
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biochemical assay which is directed to a specific disease or 
clinical condition, and employs a target. The term "target" 
refers generally to whole cells as well as macromolecules, 
such as enzymes, to which compounds under test are exposed in 
5 a screen. The use of both pre-screens and screens generally 
embodies visual detection or automated image analysis of a 
colorigenic indicator, fluorescence detection by 
fluorescence-activated cells sorting (FACS) or the use of a 
magnetic cell sorting system (MACS) performed on a population 
.1.0. of -library cells in the presence of a reporter regimen. 

The methods of the invention provide alternative but not 
mutually exclusive approaches to generation of detectable 
signal associated with productive cells for the purpose of 
detecting and isolating these cells of interest. A reporter 
15 can be a molecule that enables directly or indirectly the 
generation of a detectable signal. For example, a reporter 
may be a light emitting molecule, or a cell surface molecule 
that may be recognized specifically by other components of 
the regimen. A reporter regimen comprises a reporter and 
20 compositions that enable and support signal generation by the 
reporter. The reporter regimen may include live indicator 
cells, or portions thereof. Components of a reporter regimen 
may be incorporated into the host organisms of the library, 
or they may be co-encapsulated with individual or pools of 
25 library cells in a permeable semi-solid medium to form a 
discrete unit for screening. 

To facilitate detection of compounds of interest as 
described in the following text, absorptive materials such as 
neutral resins, e.g., Diaion HP20 or Amberlite XAD-8 resin, 
30 may be added to cultures of library cells (Lam et al . 1995, J 
Industrial Microbiol 15:453-456). Since many secondary 
metabolites are hydrophobic molecules, the release or 
secretion of such metabolites may lead to precipitation on 
the cell exterior. Inclusion of such resins in the culture 
35 causes the sequestration to occur on the resin which may be 
removed from the culture for elution and screening. 
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In one embodiment of the invention, the host organisms 
are engineered to contain a chemoresponsive construct, 
comprising a gene encoding a reporter molecule operably- 
associated with a chemoresponsive promoter that responds to 
5 the desired class of compounds or metabolites to be screened 
an the expression library. In the presence of the desirable 
activity or compound, the chemoresponsive promoter in a 
positive clone is induced to initiate transcription of the 
operably-associated reporter gene. The positive cell is 
ID-xden^fied by detectable signals generated by the expression 
of the reporter gene. 

In an alternative embodiment, a physiological probe can 
be used which generates a signal in response to a 
physiological change in individual cells as a result of the 
15 presence of a desirable activity or compound. Such a probe 
may be a precursor of a reporter molecule that is converted 
directly or indirectly to the reporter molecule by an 
activity or compound in the biochemical pathway sought. Upon 
contact with a productive cell, the physiological probe or 
20 reporter precursor generates a detectable signal which 

enables identification and/or isolation of the productive 
cell. Contact may be effected by direct addition of the 
probe or precursor to the library cells. Alternatively 
contact may be effected by encapsulation and diffusion of the 
25 probe or precursor to the library cells during screening. 

In yet another embodiment of the invention, indicator 
cells may be used to signal the production of a desirable 
activity or compound, thereby enabling identification and/or 
isolation of productive cells in the library. Whole live or 
30 faxed indicator cells, or cellular fractions thereof may be 
mixed or co-encapsulated with individual or pools of library 
cells. Indicator cells are selected for their biological 
properties which is responsive to the presence of the 
desirable actavuy cr compound. Indicator cells may be the 
35 target cells of the desirable compound. Alternatively. 

indicator cells may ne used in conjunction with a reporter to 
generate a detectable signal. 



6 2 



PCT/US96/06003 



WO 96/34112 



Pre screens and screens for each library are chosen 
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gelation using materials such as agarose al ■ 
carrageenan. 9 ° se ' alginate or 

S Parades <Ka mar h ^ ^7"-" Pities o£ the 
FACS works on the basrs of f ^ 151 = 15 «-"5, . 

™. ult . in addicion o£ a s J 

parade. The change allows elect ^ C ° the 

. 10 posUive ana negative parcTcles fro Sepa ""°" »f 

Particles may be dire J d "^ lt f ™-." S.p.»t«, 

9«-«U or 38«-„ell places P S " ed lnt ° "dividual wells of 

-se d r t :; lr a rr for separati - 

» IOC. drameter, (r-ynal ^rospneres (0 . 5 . 

modifi cations can . ' / ' ' A va riety of useful 

includ.ng co^^™ °" T 

recognr.es a cell-surfa"! " ^ ^"""liy 

** ^ tl2atlM oTi::;:: : t 9 :: ~ r- — ^ 

*• «» be incorporated into Lst cells that ^ re9i "" en 
-gnetogenic "P°".r Proteins, ^t^:;^ 
case, encapsulated cells that generate a oo ^ 
as magnetic microspheres The T PO^trve signal act 

Physically manipulated by expos^eT^ ^"^^ «» be 
- example, the selected ^cj^"" J^'"*"*"* F ~ 
application of a magnet to ^ ^ 0 * 

vessel. siae of the reaction 



30 AcrnH . 5 " 2 - 1 - ^OSTER_CONS Im2 CTS 

- -e r;:::; 9 ~ r ncion - the -< — 

-Porter construct' ^^Tc^" 10 3 
operably-associated vlth a "1 Chem ° res P°"3ive promoter 
. -/or the construct ^ L^T".^."- T 

■ accessory prote.ns that ar e evolved V-^T" 9 
transection from the chemorespons," " t " °* 

Production of sl on a u Promoter or the 



gnals . 



WO 96/34112 



PCT7US96/06003 



A chemoresponsive promoter is any double - stranded DNA 
sequence that is capable of binding an RNA polymerase and 
initiating or modulating transcription of an operably- 
associated reporter gene only in the presence of a certain 
5 kind of activity or a certain class of compounds. 

Preferably, the chemoresponsive promoter has no or only a 
negligible level of constitutive background transcriptional 
activity in the host organism in the absence of the inducing 
activity or compound. A chemoresponsive promoter that 
1(L respond negatively to the presence of an activity or compound 
by decreasing or ceasing transcriptional activity may also be 
used . 

Promoters useful in the present invention may include, 
but are not limited to, promoters for metabolic pathways, 

15 biodegradative pathways, cytochromes and stress response 

(Orser et al . 1995, In vitro Toxicol 8:71-85), such as heat 
shock proteins. For example, the Pm promoter of the 
Pseudomonas TOL plasmid meta-cleavage pathway and its 
positive regulator XylS protein which is inducible and 

20 modulated by a range of benzoates and halo- or alkylaromat ic 
compounds may be used (Ramos et al . 1988, FEBS Letters 
226:241-246; de Lorenzo et al . 1993, Gene 130:41-46; Ramos et 
al. 1986, Proc Natl Acad Sci 83:8467-8471; Mermod et al . 
1986, J. Bateriol 167:447-454). Other non-limiting examples 

25 of chemoresponsive promoters are promoters relating to 
phosphonate utilization (Metcalf et al'. 1993, J Bacterid 
175:3430-3442), promoters sensitive to cis-cis-mucona.te 
(Rothtnel, 1990); promoters sensitive to antibiotics and 
salicylates (Cohen et al . 1993, J Bacteriol , 175:7856-7862; 

30 Cohen et al . 1993, J. Bacteriol, 175:1484-1492), promoters 
from the arsenic and cadmium operons from Staphylococcus 
aureus (Corbisier et al , 1993, FEMS Letters 110:231-236); 
sfiA (Quillardet et al . 1982, Proc Natl Acad Sci 79:5971- 
5975), zwf (Orser et al . , 1995, supra). 

35 A reporter gene encodes a reporter molecule which is 

capable of directly or indirectly generating a detectable 
signal. This includes colorigenic or magnetogenic reporters 

- 65 - 
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as well as any light -emit ting reporter such as 
bioluminescent, chemiluminescent or fluorescent proteins may 
be used, which includes but are not limited to the green 
fluorescent protein (GFP) of Victoria aequoria (Chalfie et 
5 al. 1994, Science 263:802-805), a modified GFP with enhanced 
fluorescence (Heim et al . 1995, Nature 373:663-4), the 
lucif erase (luxAB gene product) of Vibrio harveyi (Karp, 
1989, Biochim Biophys Acta 1007:84-90; Stewart et al . 1992, J 
Gen Microbiol, 138:1289-1300), and the luciferase from 
.10_fi.ref_ly, Photinus pyralis { De Wet et al . 1987, Mol Cell Biol 
7:725-737). Any fluorigenic or colorigenic enzymes may be 
used which includes but are not limited to beta -galactosidase 
(LacZ, Nolan et al . 1988, Proc Natl Acad Sci USA 85:2603- 
2607), and alkaline phosphatase. Any cell surface antigen 
15 may be used, for example, E. coli thioredoxin-f lagellin 
fusion protein, i.e., E. coli thioredoxin (the trxA gene) 
expressed as a fusion protein with flagellin (the flic gene) 
on the surface of E. coli flagellae (Lu et al . 1995, 
Bio/Technology 13:366-372). 
20 An exemplary chemoresponsive reporter construct provided 

herein is pERD-20-GFP which contains the Pm promoter and the 
XylS gene of Pseudomonas (Ramos et al . 1988, FEBS Letter 
226:241-2476) that are responsive to certain classes of 
benzoates, resulting in transcription and translation 
25 (expression) of the reporter, GFP (see Figure 6) . 

Different promoter sequences may be generated by PCR and 
attached to the coding regions of GFP or flagellin- 
thioredoxin reporter. Genomic and plasmid DNA containing the ' 
promoter of interest may be purified from the relevant 
species using standard DNA purification methods, and 
resuspended in TE . Primers may be synthesized corresponding 
to the 5' and 3' boundaries of the promoter regions with 
additional sequences of restriction sites to facilitate 
subcloning. The amplification reactions are carried out an a 
35 thermocycler under conditions determined to be acceptable for 
the selected template and primers. The reaction products are 
separated by agarose gel electrophoresis, and subcloned using 
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5.2.3. 
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mi *er at 2000 rpm. A volume of library cells of E col, or 
yeast such as SchizosaccKaromyces pombe and saccharoses 
species; or spores for Streptomyces species; Badllus 
suttilis; and filamentous fungus such as Aspergillus and 
5 Neurospora species; is added to the sodium alginate solution 
so that 1-5 cells are encapsulated per droplet. The mixture 
is allowed to sit for at least 30 minutes to degas, and is 
then extruded through any device that causes the formation of 
discrete droplets. One such device is a syringe with a 25 
10- gauge, needle. The droplets are formed by adding the sodium 
alginate solution drop-wise into a beaker of gently stirring 
135 mM calcium chloride solution. Droplets are allowed to 
solidify for 10 minutes, and are then transferred to a 
sterile flask where the calcium chloride solution is removed 
15 and replaced with a suitable growth media. Encapsulated 
library cells can be grown under standard conditions. 

Microdroplets may be generated by any method or device 
that produces small droplets, such as but not limited to, 
two-fluid annular atomizer, an electrostatic droplet 
20 generator, a vibrating orifice system, and emulsif icat ion . 
Other methods for preparing semi-solid droplets are well 
known in the art; see for example. Weaver, U.S. patent 
4,399,219. 

The following example is a protocol for producing 
25 microdroplets using the emulsif icat ion technique (Monshipouri 
et al 1995, J. Microencapsulation, 12:255-262). Using an 
overhead mixer at 2000 rpm, 0 . 6g sodium polyphosphate and 2% 
sodium alginate are dissolved in 100 ml sterile water, and 
t he alginate solution is allowed to degas for 60 minutes. An 
30 oil Phase is prepared by mixing 300ml oil. such as canola or 
olive oil. with l.Og purified soy bean lecithin for at least 
30 minutes. A slurry containing 1 . 9g calcium sulphate m 10 
ml 50% glycerol is prepared by sonication for at least 15 
minutes. This slurry and a volume of library cells wnich 
35 will yield 1-5 cells per droplet are blended into the 

alginate solution immediately before introduction to the oil 
phase. The emulsif icat ion process is initiated by slowly 
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transferring the alginate mixture into the oil phase and 
miX1 " 9 f ° r 10 ^nutes at 580 rpm. 500 ml sterile water is 
then added and the mixing allowed to continue for 5 minutes 
Microdroplets can then be removed from the oil by 
5 centrifugation. The microdroplets are washed and resuspended 
m a suitable growth media, ready for culture under standard 
condxtxons if required. The size of the droplets can be 
examxned by phase microscopy. For the purpose of sorting by 
FACS or MACS, if the droplets are outside of the desired size 
iO-range-necessary for sorting, the droplets can be size 

selected using a filter membrane of the required size limit 
According to the invention, components of the reporter 
regxmen or the target of a drug screen may also be co- 
encapsulated xn a drop with library cell (s) . whole indicator 
15 cells or cellular fractions containing a bioassay, enzymes 
or reporter molecules may be mixed with library cells 
suspended in the medium prior to formation of macro- or 
mxcro-droplets as previously described. Compounds of 
interest produced by the library cells may accumulate and 
20 dxffuse within the droplet to reach the co-encapsulated 

xndxcator cells or reporter, and generate a signal. The co- 
encapsulated indicator cell may be a live target of the 
desxrable compound, e.g. pathogens for anti -infectives or 
tumor cells for anticancer agents. Any change in metabolic 
25 status of the indicator cells, such as death, or growth 

inhibition, constitutes a signal and may be detected within 
the droplet by a variety of methods known in the art Such 
methods may include but are not limited to the use of 
Physiological probes, such as vital stains, or measurement of 
30 optxcal properties of the drop. 

When the droplets are exposed to components of the 
reporter regimen, metabolites and compounds produced by the 
encapsulated library cells and the reporter comoonents may 
• dxffuse through the semi-solid medium to produce a sxanal . 
35 For example, a physxological probe may be added to a batch o~' 
droplets which are then subjected to the appropriate sorting 
format. if the library cell(s) are allowed to dxvxde within 
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the drop, the progeny of the original positive cell(s) are 
kept together in a microcolony, thereby generating a stronger 
signal. It is preferable that the semi -solid medium is 
optically compatible with the signal generated by the 
5 reporter, e.g. transparent to light for a range of 

wavelengths, so that the signal can be efficiently detected. 

Macrodroplets can be sorted using a colorigenic reporter 
either by screening by eye or by using any device that allows 
the droplets to pass through a screening point, and which has 
a-O-the -capacity to segregate positives. Microdroplets can be 

sorted using either FACS or MACS. FACS services are performed 
by a qualified operator on any suitable machine (e.g. 
Becton-Dickinson FACStar Plus) . Particle suspension^ 
densities (cells or droplets) are adjusted to 1 X 10 6 
15 particles/ml. In all cases, positives can be sorted directly 
into multi-well plates at 1 clone per well. MACS is 
performed using an MPC-M magnetic tube rack following the 
manufacturer's instructions (Dynal, 5 Delaware Drive, Lake 
Success, New York 11042). 
20 Encapsulated cells which are found to be positive in a 

pre-screen or screen can be recovered by culturing the 
droplet by placing it either on appropriate agar or liquid 
growth media or by dissolving the droplet in sodium citrate. 
After a period of culturing, the positive cells may grow out 
25 of the droplet. For convenience in handling and storage of 
droplets, the subsequent culturing may be done in multi-well 

PlatG p re . scr eened positives which have been reduced to a 
smaller population can then either be frozen and stored m 
30 the presence of glycerol or grown in multi-well plates. 

These can be used to transfer groups of clones using multi- 
pin replicators onto various types of assay plates (e.g. 
differential media, selective media, antimicrobial or 

engineered assay lawns) . Specific assays can also be 
35 performed within these microliter plates and reaa by a 

standard plate reader or any other format used in current 

high-throughput screening technologies. 
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For clarity of discussion, the following subsections 
describe in more detail the different embodiments of the 
invention involving prokaryotic and eukaryotic, donor and 
host organisms. The following embodiments are exemplary and 
5 are not intended to be limiting. 

5-3. PROTOCOLS FOR THE PREPARATION OF HIGH 

QUALITY NUfTiETr ftPTPS FROM DONOR ORGANISMS 
The availability of high quality DNA or RNA as starting 
LO^ mat ^L ial 15 ^Portant in the construction of DNA libraries 
that are representative of the genetic information of the 
donor organisms. Methods for extracting, selecting and 
preparing high quality nucleic acids from cultures of donor 
organisms or from environmental samples are provided in this 
5 section. A method for preparing subtracted DNA probes to be 
used in pre-screening DNA libraries for the purpose of 
enriching DNA related to secondary metabolism is also 
described . 



° 5 - 3 - 1 - ISOLATION™ IS0THI0CYANATE NUCLEIC ACID 

Lyophilized or non- lyophilized material can be disrupted 
by passage though a mechanical grinder, or alternatively by 
hand in a mortar and pestle in the presence of f ine ground 
glass or pumice. Immediately after grinding, ground 
lyophilized material may be mixed with 10 ml of lysis buffer 
per l-2g of material. Lysis buffer is 5M guanidine 
isothiocyanate. 50 mM Hepes pH 7.6, lOmM EDTA, and 5% 
0-merca P toethanol (or 250 mM DTT) . After mixing and 
incubation at 50°C for 5 minutes, the solution is rendered to 
4* sarcosyl, mixed, and incubated for 5 minutes more at 50°C 
prior to centrifugation at 8000 g. if the supernatants are 
visibly cloudy a 90-minute centrifugation step at 27,000g may 
be used to sediment unwanted carbohydrates. Alternatively a 
15,000g spin may be used to clear the lysate of unwanted 
contaminants. Following centrifugation, the supernatant is 
made up to 1.42M CsCl (O.lSg CsCl /ml ) and layered onto a 



WO 96/34112 



PCT/US96/06003 



previously-made 5 . 7M CsCl/TE (10 m* Tris-HCL/ 1 *M EDTA) 

" u tion In ultracentrifugation tubes "-^centrifuge ion 
can be carried out at 180.000g for 18 hours, 20 C After 
oltra-centrifugation, a clear, jelly-like layer at the 
5 ^ 42M/S VM Csc! interface is OKA, while total oellular RNA is 
present as a clear pellet at the bottom of the tube^ 

DNA from the ultra-centrifugation step oan be dialyzed 
against TE buffer, rendered 0 . 1M H.C1 . precipitated with 2.5 
volumes of ethanol, dried and redissolved in an appropriate 
XQ. volume of TE. If the DNA layer is white in color, it can be 
removed and recentrif uged for 8 hours in a 
CsCl/bisbenzidimide gradient to remove remaining 
carbohydrates. The dye can be removed by 2-5 washes with 85, 
isopropanol, and the DNA dialyzed and treated as above. 
1S RNA can be redissolved in resuspension buffer <5M 

gu amdine isothiocyanate, 50 m Hepes, pH 7.8, 10 <« EDTA) , 
diluted to 1.33M guanidine isothiocyanate with a solution of 
SO mM Hepes P H 7.8. 10 m« EDTA. If total RNA is **>"?-Jf* 
dil uted RNA sample is precipitated by the addition of 2 vol 
20 of ethanol or 1 vol of isopropanol. The precipitated RNA is 
rinsed with 70% ethanol, dried, and resuspended m water or 
formamide, and stored at -70-C until used. 

5 3 2 ISQUglSM OF no'.y'Al-CONTATNTNG RNA 
Since the vast majority of eukaryotic mRNA molecules 
contain tracts of poly < adenylic) acid at the 3- end, up to 
250 bases in length, it can be purified by affinity 
chromatography using oligo-dT cellulose matrix. A wide 
variety of commercially available oligo-dT matrices may 
30 used, including but not limited to, simple gravity columns, 
para-magnetic particles, spin and push columns. 
LnA may be stored either dissolved in water, in formamide. 
or dried at -70°C. 



WO 96/34112 



PCT/US96/06003 



5 -3. 3. ENRICHMENT OF NON-RTRncnMM „ 

TOTAL BMfl RIBOSOMA L SEQUENCES FROM 

enrlChment ^^^somall^^— 

essential step in obtaining useful RNA LIT ^ 

5 ::::; cuit or *~ -ziir 1 * r from 

rr 10 ";^ ° n neU — — gradie nt S can be 

Z pl\e^ t lr k d0minant ribOS ° mal — 
peeves (K. McGookin 1984 in Moh^ 

B^o gy Vo , 2 Nucleic Human - *>* h cj. „ « lar 

10 Followmg centrif ugation the c^ mr o 112) - 

ot rlbosom : a ^ A c n b :r :; rd c :: cain f 9 the 
^ac^s dialyzed and precipit e a ^ d scarded ' ■«* «« — ining 

The use of the Klen^TT^nTTfT^oTT^r^ 
polymery, or other DNA polymera.se which lacL 3 s- 
20 -nuclease activity, to add nucleotide to the 
ends is a standard technique often use to 

DNA molecules after rti„ J- Create blunt e ^ed 

■nucleotide set uch . l?™'. ^ > —plete 

Ration ends ; h at L " Y "° eXPl ° itSd in C """9 

25 co m pati b le to each — — -t 

S ene r^riest"^ 116 ^ ^ ^ "*"«"r 

-Libraries and constructs (Hung et al iqr, „ 

- » = 2aborovsky £t J " Gene 42 ii 9 ACidS 

Foster. 1991. Ph.D. thesis. University of Callfo 

30 Barhar L f ec a , Mo ^J£~££- 

: 5 IT™";: r: b :r rried out 

35 by a variety D , *\ . the DNA may be purified 

35 > a.iecy or mecnoas. including but not lin-ited t~ 

affinity chromatography, ethane! preelection , 
column centrifugation P"~1P— tion. and sprn- 
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S 3 5 PROTOCOLS FOR PREPARATION OF SUBTRACTED DNA 

5 - 3 ' 5 ' ISlES FOR PRE-S CEEENING 

RN A may be isolated fr« young. mid log-phase cultures 

u „,„ life cycles that have not undergone 
of organisms with complex lite cyci 

diff.Lnti.tion. This RNA pool is complementary to genes 
5 ""dn undifferentiated growth and primary metabolism^ 
^e RNA is biotinylated in vitro and hybridized m excess to 
randomly sheared, gene-sized fragments of genomic DNA from 
the homologous or closely related heterologous species 
Pbenol extraction of this mixture results in the «moval o 
10 -genomic' seguences complementary to primary metabolism RNA at 
the interface. This process may be repeated 

resulting single stranded DNA fragments are composed of the 
( + ) strand of primary metabolism genes and the U) and .-. 
strands of other genes, including secondary metabolism- 
" related genes. This mixture of DNA is denatured, and 
rehybridized for 5-10 half C c ts under highly stringent 
conditions such that only related seguences can "hybridize 
to form double-stranded DNA. The remaining single-stranded 
DNA can be removed by binding to hydroxyapatite or by 
20 digestion with mung bean nuclease. The isolated doubl.- 
stranded DNA representing non-primary metabolism related 
genes may then be labeled using random priming, and 
probe to pre -screen a library. 

25 * 1 6 PURIFICATION OF NUCLEIC ACIDS FROM SOIL OR 

5 - 3 - 5 ' ENVIRONMENTAL SAMPLES 

Soil samples are flash frozen in liquid nitrogen and 
stored at -70-C until processed. Alternatively, soil samples 
ft! stored frozen at -20-C. samples are either thawed on ice 
30 immediately prior to use, or freeze-dried prior to 

processing. f 

Total nucleic acids are extracted by a number of ^ 
protocols with minor modifications depending cn the physic-, 
state and source of the material. Dry to semi-dry samples 
35 ! re fro2e n and processed directly; very wet samp es are n.s.. 
frozen and freeze-dried; oily samples are diluted with 
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Phosphate buffered saline p ri or to processing. A„ y o£ tne 
followmo procedures .ay be adapted: Ogram et al. 3987 " 
Microbiol. Meth. 7:57-66; stef fan et al l 988 ,„„, ! • 
Microbiology. 54:337-363; werner et al i '/^ J:: 1 ""- 

s i7,as ! so7 2 .so7e ; zhou et ai. „„. Appl . Env L n en ' 

Microbiol. 62(21:316-322. 

Briefly. 5 g samples are lysed directly by dropwise 
addition to hot guanidium isothiocyanate lysis buffer L 
Section S. 3.1,. and subjected to a cesium chloride 
Z"f-2T° n - the samples are mixed with l3 5 

eo TA r ra ; tion bu£fer ,io ° " m Tris - Hci P » 8 ' 0 ' "0- 

EDTA , 100 mM sodium phosphate. 1.5* NaC1 1% CTAfi 
(hexadecylmethylammonium bromide, and 100 ul of 20 mg/ml 
proteinase Kin SO ml centrifuge tubes and shaken by 

°l ShaKin9 " 225 "P" 3° minutes at 37-c After 

shaking 3.5 ml of 20, SDS is added . an<J th- , 

C 65 ° C 2 hOUrS ' " ith ^—r-end'shaxing 

every 15 -20 minutes. The supernatants are collected by 

20 pel r : fU9atl0n " 6000 * ^ *~ 30 minutes at 20-C. The 
20 pe lets are re-extracted 3X by adding 4.5 ml of extraction 
buffer and 0.5 ml of 20, SDS , vortexing for , 
followed by a 10 minute incubation at 65-c and re- 

e"t n ra r c C f :d 9a t ti0n " '«» 3 extractions are 

extracted twice with chlorof orm-isoamyl alcohol (48:1, The 
25 nucleic acids are precipitated by the addition of 0.6 volumes 
of isopropanol followed by a one hour incubation and 
centnfugation at 16,000 x g for 20 minutes at room 
temperature. The crude nucleic acid pellets are then 
resuspended in 10 mM Tris-HCl P H 8.0. 2 mM EDTA . Further 
30 purification o£ the DNA is h„ nrar u 

Total RNa i 7 DEAE ch ^tography if needed. 

Total RNA is obtained from the crude pellet by selective 
Precipitation of RNA by 4„ lithium acetace Qr a=id 
extraction (Ausubel et al 1990 rr 00no n • 

and w ,.,^ _ 1990 1 Greene Publishing Associates 

and Wiley Intersexes, New YorK; Hoben et al . 198B . Appl 
35 Environ, Microbiology, 54:703-71). " 



PCI7US96/06003 



WO 96/34112 



537 ppPRTP OF DNA 
NrcKed or degraded DNA samples are repaired by 

(Neu B ,l^' | lT™: g Cl2. 40 „M dNTPs, 5 U/10 M9 
5 HZ ^ *« " - at „.C. The DNA is ethanol 

r«^pi» t n by the addition of 1/10 vo!u m e of 3M sod,™ 

«^ •? s volumes of 100% ethanol . 
aCeC ir— ti . and resuspension rn water the DNA 
, •« rreated with E. coil DNA ligase in B. coll Ugase 

io i:r "or".-** PH » - Hgcl , r - r » UM 

NA D. and 2S uM BSA, 1013 of E. coli for 1-2 hours st 16 C. 
IZ'r treatment the DNA sample is diluted 5 fold w>th a 
^lution of 20 sM Tris-HCl pH 8.0. 0.3M sod,™ acetate and 
1S x ra ted once with phenol and once with chloroform T e 
addition of 2.5 vdumes of ethanol to the aqueous phase 

ipitates the OKA. The samples are rinsed two times „ ; h 
,0% ethano! and resuspended in sterile water or 10 mM Tns 
HC1. PH 8.0. 1 mM EDTA and frozen at -70°C until used. 

54 p E OTOCOLS_FO^PEOKARVQT^ 

The' procedures for preparing natura! pathway expressron 
iibraries and chimeric pathway expression libraries using 
pro,aryotic host and donor organisms are provide in this 
25 section. Purified high guaiity DNA obtained y he ^ 
techniques described in Sections 5.3.1 5 : 3.4 
the following procedures. 

5 4 1. BACTERIAL SPECIES. STRAINS, AND CULTURE 

roNnTTIONS . ' " 

Particularly good expression host organisms 

, ofif ,i Pnt - and recombination 
restriction-minus, endonuclease deficient, 
deficient. Por E. coli, a preferred strain is XL1-MR 

McrCB - , McrF- , Mrr- , hsdr- , endal-, recA ,. 



genotype, HcrA- . McrCB - , McrP- — ■ ^ 

Strevtowyces, a preferred strain is S. ^ 
Bacillus subtiiis. preferred strains are B sut^ 



PB168 tr P C2; B . subtiias PB5002 sacA, degUhy; 
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PBl68delta trpC2, pksdelta 7c; fi D . . , 

39374. ' B - Subt ^s ATCC 39320 and 

The donor organisms are bacterial species Somp 

morphologies were performed to the level „„ 

that the samples are taxonomically Averse ' """" 

IS and at ^"c^oTe^e™ " ^"i" 9 lib »^ 

c for expression. Marine, Actinomyces and 
Stre ptomya es species are grown only at 30-c. 

5 ' 4 - 2 - ^^Eg^SMIfliLgfLDON OR GENOMIC n»,» 

- T he hLrrirLr;:;:^:: :r ceria : a iomL — - ~- 

in 10mM Tri s. smm ^ C ^ r ^ 9 " tl0 " " d "-pended 

— -scrihea in section . N ma o y r b : h :T f t ied . b r ^ 
Pellet may be solu b il iz ed in SDS/proteinase K e c „ 
Phenol Chloroform, and precipitated with £XC " Cted * 

25 resulting purified DNA L ^opropanol . The 

An! / I resuspended overnight in TE. 

Al lq uots of each purified DNA are subjected t 

.. :;t:,:;'"~ ; h - 

"" h - ."t p :::r,r 

>. :i:.r fr- - ~ ii-r 

qUlr - S lar S e native fragments of 
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genomic DNA. This mixture can optionally be size- 
fractionated through sucrose gradients. Smaller fragments of 
DNA for the chimeric pathway expression library can 
simultaneously be selected by size fractionation. 
5 The digestion and size fractionation are confirmed by 

subjecting aliquots of the samples to agarose gel 
electrophoresis . 

5 . 4 .3. GENERATION 0^ PKOKARYOTir PROMOTER FRAGMENTS 
10 ~ • --in. one example, synthetic oligonucleotides are used to 
construct a fragment containing two copies of the beta- 
galactosidase promoter (lac), one on either side of a unique 
BamHl site, with each copy of lac positioned to direct 
transcription toward the centered BamHl site (Figure 4A) . 
15 The synthetic oligonucleotides are phosphorylated by the 
synthesizer. 400ng of each oligonucleotide is annealed by 
boiling five minutes and slow cooling over 30 minutes to 25°C 
before ligating 30 minutes at room temperature with T4 DNA 
ligase. The ligation mix is subjected to agarose gel 
20 electrophoresis and 2 - 7 kbp fragments are excised and 

purified by Gene Clean. The joined, paired, and properly- 
oriented cassettes are inserted into the Smal site of the 
pBSK plasmid vector by incubation for 16 hours at 1S°C with 
T4 DNA ligase in IX ligase/PEG buffer. The ligation mix is 
25 introduced into XL1-MR cells. Individual clones are analyzed 
by restriction enzyme analysis and may optionally be 
sequenced to confirm orientation and accuracy. 

The pBSK- (lac/ lac) r , clones (where n is an integer from 2 
to 10) are cultured in 0 . 3 liter quantities and the plasmids 
30 purified using a olasmid preparation kit (Qiagen) . 40 M g of 
the selected and purified pBSK- ( lac- lac) B is digested to 
completion with Smal in IX buffer. The digested DNA is 
subjected to agarose gel electrophoresis and the lac/lac 
oromoter dimers are excised and purified with Gene Clean, and 
35 digested to completion with BamHl in IX buffer. See Figure 
43 and AC. The digested promoter monomers are 
phenol: chloroform extracted, ethanol precipitated, and 
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dephosphorylated by treatment with CIAP in IX CIAP buffer 
The dephosphorylated, digested promoters are extracted and 
precipitated as before, and resuspended in TE at a 
concentration of 20ng/^l before storing at -20°C or further 



use . 



In another example, prepared promoter fragments are 
mixed with similarly-prepared linkers that do not contain 
promoter sequences, and then used in- ligations with the donor 
genomic DNA. This allows the generation of cassettes with 
lO.^nly.one promoter, in cases where anti-sense transcription is 
a consideration. 



5.4.4. PREPARATION OF GENE CASSETTES FOR 

COMBINATORIAL CHIMERIC PATHWAY EXPRESSION 
IjIBRAR I ES 

15 " 

In one example, BairHI - BamHI fragments of genomic DNA 
(mean size 3.5 kbp) are mixed with an excess of 
dephosphorylated promoter fragments, and then ligated. The 
molar ratio of promoters to genomic DNA fragments is 20:1 
2q The resulting units (lac / genomic DNA fragment/ lac) will 
have a mean size of approximately 4 kbp. Other prokaryotic 
promoters that may be used include other E. coli promoters 
(Harley et al . , i 98 7, Nuc Acid Res 15:2343-2361), and 
Streptomyces promoters (Strohl 1992, Nuc Acid Res V20:96l- 
25 974) for use in Streptonyces species expression hosts. In 
hosts with undetermined or significant recombination ability 
at as desirable to use a series of different promoters such 
that any clone containing several cassettes will contain 
several different promoters. 



30 



35 



5-4.5. PREPARATI ON OF SOLID SUPPORT 
Ultralink Immobilized Streptavidin beads were purchased 
from Pierce (Cat. No. 53113). 3M Emphaze Biosupport Medium 
AB1 "blank beads" was purchased from Pierce (Cat. No. 53112). 
Similar solid supports from other vendors may be substituted 
for this procedure. 
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Oligonucleotides were purchased from Life Technologies 
(Glb co-BRL>- Oligonucleotide »Bead-link-5» is 5- biotin-GCC 
GAC CAT TTA AAT CGG TTA AT 3' . » Bead- 1 ink- 3 » IS S' 
phosphate-TAA CCG ATT TAA ATG GTC GGC 3' . When annealed. 
5 these oligonucleotides contain a Swal restrict.™ 

endonuclease site (shown underlined below) . Annealed bead- 
link oligonucleotides also leave an AT overhang at the 3' 
end. This overhang is shown by holding on oligonucleotide 
bead-link-5 . 
10~biot,L«-GCC GAC CAT TTA AAT CGG TTA AT 
CGG CTG G TA AAT TTA GCC AAT 

Equimolar amounts of each bead- link oligonucleotide are 
m ixed together in an eppendorf tube. 5M NaCl is added to the 
tube to a final concentration of 300mM. The reaction is 
15 incubated at 60°C for 1.5 hr . Annealing was confirmed by 
agarose gel electrophoresis using non-annealed 
oligonucleotides as a control. 

To prepare blank beads, lOOmg dry beads was resuspended 
in 1ml phosphate buffered saline (PBS) . Bovine Serum Albumin 
20 (BSA) was added to final concentration of Img/ml . Beads were 
rotated for 4 hrs at room temperature. Beads were pelleted 
by centrifugation and washed 3x with 1M Tris-HCl P H8.0 for 2 
hours at room temperature to block unreacted azalactone 
sites Beads were pelleted by brief centrifugation and were 
25 washed extensively with PBS. Blank beads were stored m PBS 
at 4°C until used. 

To bind bead-link oligonucleotide to streptavidin beads 
10 M g previously-annealed oligonucleotides were mixed with 
20,1 Ultralmk Immobilized Streptavidin beads in Ix binding 
30 buffer (PBS, SOOmM Nad). Beads were incubated for three 
hours at room temperature with inversion to keep the beads 
suspended. Beads are pelleted and washed 3x with 1ml binding 
buffer Beads are then washed and equilibrated with Ix 
ligation buffer (50m!, Tris-HCl pK7.8, lOinM K,gC12, lOmM 
35 dithiothreitol. ImM ATP , 25,g/ml BSA). Beads are stored at 
4°C until used. 
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5-4.6. ASSEMBLY OF A COMBINATORIAL CHIMERIC 
PATHWAY FXPP^c tqn T.TRRAPV *-"lMERI C 

Attachment of gene cassettes to magnetic beads: The gene 
cassettes are phosphorylated using T4 polynucleotide kinase 
5 m IX kinase buffer. The phosphorylated fragments are 

ethanol precipitated and resuspended in TE. 1/10 of this is 
legated to a mixture of two short non-phosphorylated 
synthetic linkers. The remaining 9/10 is used for a later 
procedure. Each linker will have one of two rare-cutting 
loJ^V^S' either Notl or Srf 1 . m addition, the Non- 
confining linker is biotinylated at the time of syntheses of 
the oligonucleotides. The Notl and Srfl linkers are mixed 
with the phosphorylated transcription units in the ratio 
respectively, of 100:100:1, and ligated with T4 DNA ligase in 
15 IX lxg.se/PEG buffer for 16 hours at 15<>c. This mixture is 
allowed to bind to avidin-conjugated MPG magnetic beads, and 
the manufacturer's protocols are used to remove the bead - 
bound transcription units from the ligation mixture 

In the mixture of ligated DNA, approximately 1/2 will 
20 have a biotinylated Notl linker placed at one end and a Srfl 
linker at the other end. The Notl ends will be bound to the 
beads by avidm-biot in linkages. The fragments with Notl 
lxnkers at both ends are not involved in further addition 
steps. The fragments with Srfl linkers at both ends are not 
25 retained m the magnetic separation step. 

Preparation of pool of DNA for addition- to beadbound 
DNA: The remaining 9/10 of the phosphorylated transcription 
units are li gate d as above, but to the Srfl linkers only 
followed by digestion to completion with Srfl 
30 dephosphorylation, purification and ethanol precipitation 
De-protection of bead-bound DNA: Transcription units 
bound to the beads are digested to completion with the Srfl 
enzyme in IX Srfl buffer. The reaction is heat - inactivated 
and the beads are removed by magnetic separation. 
35 Concatenation: The beads are then added to a lioation 

mix containing the dephosphorylated Srfl-Srfl digested 
transcription units in IX lagation buffer. Ligations are 
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commenced by addition of T4 DNA ligase and proceed for 60 
minutes, 25°C, before heat - inact ivation of the ligase and 
magnetic separation of the beads. Ligations will primarily 
occur between phosphorylated bead-bound DNA and non- 
5 phosphorylated transcription units. The transcription units 
on the bead are phosphorylated by T4 polynucleotide kinase, 
heat-inactivated, magnetically-separated, and returned to the 
ligation mixture with the addition of more T4 DNA ligase. 
This cycle is repeated ten times before cleaving the 

10— polymer- from the beads by digestion with Afotl . The cleaved 
DNA is ethanol precipitated, resuspended in TE, and viewed on 
an agarose gel to gauge the quality and size range before 
insertion into the SuperCos 1 or other vector, according to 
the expression host . The concatemers are used to generate a 

15 prokaryotic library in the relevant expression host as 
described in Section 5.4.5. 

5.4.7. ASSEMBLY OF A COMBINATORIAL NATURAL PATHWAY 
EXPRESSION LIBRARY __ 

2Q The expression vector for an E. coll library is 

desirably the cosmid SuperCos 1, capable of maintaining 
inserts of 30-42kbp in size. Insertion of the DNA fragments 
into SuperCos 1 and packaging with Gigapack extracts are 
performed according to the manufacturer's directions 

25 (Stratagene) . 

Briefly, XL1-MR host cells are infected with SuperCos 1 
phage containing the DNA library. This is performed as 
follows: XL1-MR cells are grown overnight in 5mL LB medium 
with 1% maltose, 10mm MgSO, at 300 rpm, 37°C. The overnight 

3Q culture is diluted 1:10 and cultured 3 hours in LB/lOmM MgSO< 
at 300 rpm, 37°C. The culture is pelleted by cent rifugat ion 
at 800xg and resuspended in 5mL LB. 600^1 of this suspension 
is incubated with 500cfu of library packaged in phage 
particles for 30 minutes at ambient temperature, followed by 

35 a 60 minute incubation with 8 vol LB at 300 rpm, 37°C. 

In order to amplify the expression libraries, the 
infected host cells are spread on 150mm Petri dishes with 

- 83 - 
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50mL LB. 50/zg/mL ampicillin. The plates are previously dried 
for 4 8 hours at ambient temperature. After spreading, the 
plates are allowed to incubate overnight at 37°C. The plates 
are scraped and the colonies resuspended with 3mL 
5 15% glycerol, 85% LB per plate. This bacterial suspension is 
stored at -70°C for further use. 

To prepare the libraries for screening individual 
clones, the infected host cells are spread on 150mm Petri 
dishes with 50mL LB, 50mg/mL ampicillin. The plates are 

10 -previously dried for 4 8 hours at ambient temperature. After, 
spreading, the plates are allowed to incubate overnight at 
37°C. Resulting colonies are picked with sterile toothpicks 
and transferred one per well to multi-well plates. Each well 
of a 384 -well plate contains 75//iL LB. 50/ig/mL ampicillin, 7% 

15 glycerol. The outer rows (80 wells total) are not inoculated 
but are similarly filled with medium to provide an 
evaporation barrier during subsequent incubation and 
freezing. These inoculated master plates are placed at 37°C 
for 16 hours without shaking. The overnight master 384 -well 

20 plates are used as a source plate to replicate into one or 
more working 384 -well plates or Omni -Trays. The master 384- 
well plates are then sealed individually and frozen at -80°C. 
Replication is done with a 384 -pin replicator. Before and 
after each use, the 384 -pin replicator is dipped sequentially 

25 into bleach for 20 seconds, water for 30 seconds, then 

ethanol for 5 seconds before flaming. Methods of library 
assembly are dependent on the selection of vector and 
expression host. 

30 5.4.8. PRE- SCREENING OF EXPRESSION LIBRARIES 

There are three categories of pre-screens: intracellular 
differential, and selection. 

Briefly, the first category, intracellular pre - screening 
entails introduction of the library into a host engineered to 
35 contain a chemo- responsive reporter construct. The reporter 
is GFP (green fluorescent protein) or /3-galactosidase , and 

- 84 - 



WO 96/34112 



PCT/US96/06003 



selection is done by fluorescence-activated cell sorting 
(FACS) or macrodroplet sorting. 

The second category, differential pre-screening, entails 
incubation of the library in the host with fluorescent or 
5 chromogenic physiological tracers, followed by FACS or 
macrodroplet sorting. 

The third category, selection pre-screening, entails 
incubation of the library in the host with selective agents 
such as antibiotics, followed by FACS or macrodroplet sorting 
10~to- identify surviving or multiplying cells. 

For all methods, cell sorting is done on bulk cultures 
of amplified libraries prior to examination of individual 
cultures . 

The libraries may be pre-screened by FACS or 
15 macrodroplet sorting. Pools of host cells containing the DNA 
libraries are cultured in one of two formats promoting either 
high or low density micro-environments. 

In the first format, cells of the amplified library are 
examined as individual cells. An E. coli library aliquot is 
20 grown for 4 hours at 30°C in 20 vol medium at 300 rpm before 
pelleting, resuspension in 1 vol sterile ddH 2 0, incubation 
with fluorescent probes (as needed) , and placement on ice for 
transfer to the FACS facilities. 

In the second format, aliquots of the amplified library 
25 are encapsulated and cultured in the presence of substrates 
or selection agents as described in Section 5.2.3 before 
transfer to the FACS or macrodroplet sorting facilities. 

For cultures to be examined with fluorescent tracers or 
substrates, the cultures resuspended in ddK 2 0, are stained 
30 before FACS following the manufacturers protocols, typically 
as follows: incubations are in the dark, at room temperature, 
for 15 minutes, followed by pelleting for 5 minutes in a 
1.5mL microfuge tube and resuspension in 1 vol cold ddH 2 0. 
After sorting, pools of selected 1-1000 clones or 
35 macrodroplets from the expression libraries are cultured in 
0.5L nutrient media. The cultured bacteria and media are 
processed for chemical analysis by extraction with 0 . 5L ethyl 
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acetate. Rotary evaporation yields a crude organic extract 
of approximately 20mg-lg extract per liter culture. The 
cognate cloned DNAs are purified and re- transformed into host 
cells to confirm the localization of relevant sequences to 
5 the cosmid. Chemical samples generated by expression from 
library clones may be examined by HPLC using a series of 
columns (cationic, anionic, reverse phase) and subsequently 
by qualitative chemical analysis using NMR . 

10-_ .. --S-.4-.9. METABOLIC TESTING OF MARINE GRAM {-)/£ COLI 

LIBRARY BY PLATE REPLICATION 

Each wild-type marine species is tested prior to 
preparation of the DNA libraries to prevent redundancy and to 
help determine the array of metabolic tests to be done on the 
15 completed libraries. 

To prepare the libraries for screening individual 
clones, the infected host cells, such as E. coli XL1-MR, are 
spread on 150mm Petri dishes with 50 ml LB, 50mg/ml 
ampicillin. The plates are previously dried for 48 hours at 
20 ambient temperature. After spreading, the plates are allowed 
to incubate overnight at 37°C. Resulting colonies are picked 
with sterile toothpicks and transferred one per well to 
364 -well plates. Each well contains 75 M l LB, 50 ^g/ml 
ampicillin, 7% glycerol. The outer rows (80 wells total) are 
25 not inoculated but are similarly filled with medium to 

provide an evaporation barrier during subsequent incubation 
and freezing. These inoculated master plates are placed at 
37°C for 16 hours without shaking. The overnight master 
384 -well plates are used as a source plate to replicate into 
3£) one or more working mult i -well plates or Omni -Trays. The 
master 384-well plates are then sealed individually and 
frozen at -80°C. Replication as done with a 384-pin 
replicator. Before and after each use, the 384 -pin 
replicator is dipped sequentially into bleach for 20 seconds, 
35 water for 30 seconds, then ethanol for 5 seconds before 
flaming . 
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Working mult i -well plates or Omni -Trays are used as 
source plates to replicate the DNA libraries onto a series of 
differential and/or selective media (e.g. siderophore 
detection media or antimicrobial lawns) . The results are 
5 compiled and compared to the profiles of the wild- type marine 
bacteria used to construct the DNA library. 



5.4.10. METABOLIC TESTING OF MARINE GRAM (-)/£. COLI 
LIBRARY BY MACRODROPLET ENCAPSULATION 

io Clones are encapsulated by taking sodium alginate and 

dissolving in 100 mL of sterile water at a concentration of 
1% using an overhead mixer at 2000 rpm. A volume of library 
suspension is added so as to embed 1-5 clones per droplet. 
The mixture is allowed to sit for at least 30 minutes to 

15 degas . The mixture is then extruded through any device that 
allows it to form individual droplets. One such example is a 
syringe with a 25 gauge needle. These are dropped into a 
gently stirring beaker of 135mM calcium chloride. Droplets 
are allowed to harden for 10 minutes and then are transferred 

2Q to a sterile flask and the calcium chloride removed and 
replaced with LB/Amp media and a substrate (e.g. 
x-glucosidamine) . Flasks containing the droplets are then 
shaken at 30°C overnight and examined the following morning 
for positive clones indicated by the presence of blue 

25 colonies. 

Droplets are placed in a single layer in a large clear, 
tray and scanned by eye. Positive colonies are removed and 
placed in 96-well master plates containing LB/Amp and 50 mM 
sodium citrate pH 7 . 4 to dissolve the droplet, and allowed to 

3Q grow at 37°C overnight. These overnight master 96-well plates 
are used as a source plate to replicate into one or more 
working multi-well plates or Omni-Trays. The master 96-well 
plates are then sealed individually and frozen at -80°C. 
Positive clones can then be either sent for specific testing 

35 of the products or sent through another round of 

pre - screening or screening. Further screening may be 
performed by replication which is done with a multi-pin 
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10^ 



replicator. Before and after each use, the multi-pin 
replicator is dipped sequentially into bleach for 20 seconds, 
water for 30 seconds, then ethanol for 5 seconds before 
flaming . 

5.4.11. METABOLIC TESTING OF MARINE GRAM {-)/E. COLI 
LIBRARIES BY MI CRODROPLET ENCAPSULATION 

Microdroplets may be generated by the following method. 

Using an overhead mixer at 2000 rpm, 0 . 6g sodium 

polyphosphate and 2% Sodium alginate are dissolved in 100 ml 

sterile water. This mixture is allowed to degas for 60 

minutes. Then 1 . 9g calcium sulphate is sonicated in 10 ml 

50% glycerol for at least 15 minutes. This slurry and a 

volume of the library suspension which will yield 1-5 cells 

15 per droplet are blended into the alginate solution 

immediately before introduction to an oil phase (olive oil) 

which has been premixed with the addition of 1 . Og purified 

soy bean lecithin for at least 3 0 minutes. The 

emulsif icat ion process is initiated by slowly transferring 

2Q the alginate mixture into the oil phase and mixing for 10 

minutes at 580 rpm. 500 ml sterile water is then added and 

the mixing allowed to continue for 5 minutes. Microdroplets 

can then be removed from the oil by centrif ugat ion and washed 

and resuspended in LB/Amp. For the purpose of sorting by 

25 FACS, if the droplets are outside of the desired size range 

necessary for sorting, the droplets can be size selected 

using a filter membrane of the required size limit. Clones 

can then be grown 2 hours at 30°C with shaking in LB/Amp 

media containing a fluorescent substrate. 

3 Q Following incubation the sample is prepared for sorting 

with FACS by centrif uging , washing and resuspending in 

sterile water at a density of 1 X 10 6 droplets per ml. The 

size of the droplets can be examined by phase microscopy. 

FACS services are performed by a qualified operator on a 

35 Becton-Dickinson FACStar Plus and positives are sorted 

directly into multi-well plates containing LB/Amp, isolating 

positives to 1 clone per well . These plates are allowed to 
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r ,rc until the colonies grow out of the beads (1-2 
These ov -ght plates ate use, as a source plate to 
eo cate into one or m ore working multi-well plates or 
on Trays The .aster multi-well plates are then sealed 
Omni-Trays. m positive clones can then 



flaming . 

5.4 .12 



SSS— = LXVT D ANS LIBRARY 

by PLATE pp.PT.TCATION 

Each cultivable wild-type actinomycete species rs tested 
prior to preparation of the D N A libraries to prevent 
Iaxono.it redundancy, and to help determine the array of 
rae tabolic tests to be done on the completed 1 * • * 
prepare the libraries for screenrng 

transformed host cells. Streptomyces Irvrdans TK« are 
spread on 150mm Petrr dishes with F10A. The plates are 
previously dried for «B hours at ambient temperature^ After 
spreading, the plates are allowed to incubate overnight 
30°C. Selection is initiated by overlayrng with 
throstrepton. Resulting colonies are P^^ "rth =t r e 
toothpicks and transferred one pet well to * f ^' 

Each well contains F10A media. These inoculated master 
Each well The overnight ma ster 

> I* source plate to replicate into 

L or Ire working m ulti-well plates or Omni -Trays The 
master 96 - well plates are then sealed individua ly nd 
froz .„ at -BO-C. Replication rs done with a multi-pin 
~Z icator Before and after each use. the multi-pin 

3 I!-.; or'is dipped serially into bleach for ,0 seconds, 
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30 SeC ° ndS - — — 1 <~ » -ends before 



Working multi-well plates or Omni-Travs are used 
source plates to replicate the dna l ibrari I" d " 

5 differential and,or selective media "ZZ oV T" 

or antimicrobial lawns). The result, antlblOClC P*"" 

-pared to the proxies of the^l .^e £££ ^ 
construct the DNA library. bacteria "sed to 



10-.- ---5..4.13. 



METABOLIC TESTING OF 

« "™ - : :i™ ™:r ™r r r~ 

2o blue colonies. """"^ Cl ° n " "Seated by the presence of 

20 »» y r^^T^ ^ lt — ^ - - -r 3 e clear 
Placed in 96 -well LTe r D TT *" """^ " d 

citrate P H 7 « to Z , P T COntai -"9 "OA 50 m» sodium 
30»C for 2 days Th ' dr ° Pl " S "* then ^-n at 

" -ed as a source plate^to 0 ""; 9 ^ 

-iti-„en P iatis p r at ; into one ° r — — 

T than sealed indiyidnaliyTnd ^^^1^ 

10 screenrn,." ^r heTscreV^" ^ ^ P ™eenin g o r 
as described ablTi ^ ^ T » 



5.4.14 



35 



tS^^M^ll* CO -ENCAPSULATION 



Poois of l.hrary c!^ are t , tTr^T^aT^T" 

^t:t^;zi rtTir 9 — — — 

»-x_a i< alginate to result in 
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approximately 1 cell per macrodroplet . In addition, adequate 
indicator cells are included to result in approximately 50 
target cells per droplet. Macrodroplet s are produced as 
described in Section 5.4.10, and cultured under appropriate 
5 conditions for the library and indicator cells. 

In general, S. lividans library macrodroplets are 
cultured at 30°C in R5 or F10A, and E. coli library 
macrodroplets are cultured at 30-37°C in LB or B3 . The media 
and temperature may be adjusted to accommodate the 

10 -physiological needs of the indicator cells. To visualize 
effects of the library cell has on the indicator cells, the 
following reporter regimens are utilized: to detect cell 
death, inclusion of neutral red or congo red; to detect cell 
viability, inclusion of substrate relevant to indicator cell 

15 (e.g., X-glucopyranoside for E. faecalis) ; to detect B- 
galactosidase reporter activity in response to promoter 
activation, inclusion of 80 mg/ml X-gal in culture media. 
After isolation of positive macrodroplets as described in 
Section 5.4.10, indicator cells are eliminated by addition of 

20 antibiotics that are selective for the library cells but not 
the indicator cells. The library cells are then stored 
and/or further examined as desired. 

5-5. PROTOCOLS FOR EUKARYOTIC EXPRESSION LIBRARIES 
25 This section describes procedures that may be generally 

applied to prepare combinatorial gene expression libraries of 
eukaryotic donor organisms. The steps involved in the 
preparation of a combinatorial chimeric pathway gene 
expression library in eukaryotes are shown in Figures 5A-5F. 
30 Particularly good expression eukaryotic host organisms 

are stable, non - f i lamentous , and characterized sufficiently 
so as to be genetically manipulatable for the purposes of 
gene expression. For yeast and fungi, a preferred species is 
S. pombe, which is grown at 30°C (C. Guthrie and G.R. Funk, 
35 Guide to Yeast Genetics and Molecular Biology, Methods in 
Enzymology, Vol. 194, Academic Press). A. thaliana and N. 
cabacum ceils are preferred hosts (C.P. Lichtenstein & J. 
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Draper, Genetic Engineering of Plants, DNA Cloning Vol. II. 
pp. 67-119) . 

5.5.1. REMOVAL OF SATELLITE GENOMIC DNA BY DENSITY 
5 GRADIENT CENTRT FUGATION 

Eukaryotic genomes often have large amounts of 

repetitive DNA which consists of primarily ribosomal coding 

regions, or sequences of no apparent function. Thus, in 

preparing genomic DNA from eukaryotic donor organisms, it may 

-jO-be. desirable to exclude such non-coding DNA sequences from a 

library. Standard CsCl genomic DNA purification methods in 

the presence of the DNA binding dye, Hoechst 33258 (Cooney & 

Matthews, 1984) may be used to separate out various classes 

of genomic DNA prior to cloning. 

15 

5.5.2. GENERATION OF EUKARYOTIC PROMOTERS AND 

TERMINATOR FRAGMENTS 

Both promoter and terminator gene fragments may be 
produced by PGR using sequence-specific primers adapted from 
published sequences of known promoters and terminators. The 
choice of promoter and terminator sequences can be determined 
by the host organism used. For instance if S. pombe is used 
as an expression host, both native promoters, such as mat 1 
or ura 4, and non-native promoters such as those derived from 
viruses, e.g., CMV, SV40 (Forsburg, 1993 Nuc Acid Res. 
8:4321-4325), or from humans e.g., chorionic gonadotropin or 
somatostatin (R. Toyama, H. Okayama 1990, FEBS Letters 268(1) 
pp. 217-221) . Genetically engineered promoters similar to 
those found in the inducible tetracycline system (Faryar et 
al. 1992, Curr Genet 21:345-349) may also be used. 

PCR reactions may be performed in a commercially 
available PCR machine using standard PCR reaction conditions 
and DNA polymerases of high fidelity and throughput, such as 
but not limited to. Pfu polymerase (Stratagene) or Vent 
^ polymerase (New England Biolabs). Since not all primer sets 
will use the same reaction conditions, precise conditions may 
be determined empirically by techniques known in the art. 
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PCR oligonucleotide primers maybe obtained commercially or 
synthesized by methods well known in the art. 

The promoter and terminator fragments generated by PCR 
may comprise restriction sites at the 5' ends . Bgl II, Xho 
5 i, and BamHI are used herein to illustrate the principle of 
the invention. Any restriction sites may be used as long as 
the site does not appear within the promoter or terminator 
gene sequences . 

To generate cloning sites compatible to cDNA or genomic 
1-0-DNA- 'inserts, cleavage of the promoter gene fragments with Bgl 
II and Xho I will generate promoter gene fragments which have 
at their 5' ends a Bgl II site and an Xho I site at their 3' 
ends. Terminators are cut only with Xho I and will have only 
an Xho I site at their 5' end. 5' and 3' orientations are 
15 based on the expected direction of transcription across the 
promoter or terminator gene fragment. See Figure 5B. 

Partial fill-in reactions utilizing the large subunit of 
E. coli DNA polymerase I (Klenow fragment) and a subset of 
deoxynucleotides (in this case dCTP and dTTP) may be used to 
20 generate promoter and terminator fragments that are incapable 
of self -ligation by their Xho I ends. The Bgl II ends of the 
promoter fragments cannot be affected because of the lack of 
base-complementarity, and the BamHI end of the terminator 
fragments have no exposed 5' end for the Klenow fragment to 
25 utilize. 

Treatment with a phosphatase, such as calf intestine 
alkaline phosphatase, will prevent Bglll self -ligations , and 
provide similar termini for ligations in both the promoter 
and terminator fragments. cDNA fragments are protected from 
30 digestion with Notl by incorporation of 5' -methyl dCTP during 
first strand synthesis (Short, J.M. 1988, Nuc Acids Res 
16 : 7583-7600) . 

In an alternative embodiment of the invention, when DNA 
inserts are derived from mRNA, directional cloning may be 
35 applied to improve the efficiency of cloning. The cDNA 
inserts can be unidirect ional ly ligated in the sense 
orientation with respect to the promoter and terminator 

- 93 - 
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fragments. This can be achieved by generating different, 
non-ligatible ends on both promoter and terminator fragments. 
Bgl II, Xho I, Xma I, and BamHI are used to illustrate the 
invention. Any pair of enzymes that generate compatible ends 
5 and can be protected by methylation can be used. 

An Xmal site is substituted for the Xho I site at the 5' 
ends of the terminator fragments, while the preparation of 
the promoter fragments is unchanged. Xma I is used because 
it is compatible with Not I by filling in with Klenow 
10-fragment- and dCTP . This results in a terminator fragment 
that has a two-base dCTP-dCTP 5' overhang, which is 
compatible with suitably prepared Not I digested cDNA gene 
fragments. See Figure 5A. 

15 5.5.3. PREPARA TION OF DNA INSERTS 

Coding gene fragments for the eukaryotic libraries will 
be derived from two principal DNA sources, namely that of 
genomic DNA (gDNA) or complementary DNA derived enzymatically 
from messenger RNA (cDNA) . Strategies for preparation of 
20 gDNA or cDNA are very similar, but not identical. 

Complementary DNA is made from messenger RNA and/or 
total RNA using standard protocols available in the 
literature, or particular to a manufacturer's instructions. 
Isolation of total RNA may be accomplished simultaneously 
25 with genomic DNA by the guanidium- isothiocyanate method 
described in Section 5.3.1, and mRNA can be isolated by 
subsequent affinity chromatography over oligo-dT cellulose. 

First strand cDNA synthesis can use an oligo-dT DNA 
primer that contains a cloning site, e.g., a Not I site, at 
30 the 5' end. An oligonucleotide of random sequence, which 

contains an internal Not I site near its 5' end, can also be 
used for randomly-primed first strand synthesis. The use of 
this alternative primer avoids 3' bias for large mRNAs . 
Methylated deoxynucleot ide , such as 5-methyl -dCTP may be used 
35 with a polymerase such as Pfu to provide protection from 

restriction digestion (Short et al . , supra; G.L. Costa, 1994, 
Strategies 7:8). Only non-methylated sites present in the 
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■ ■ n Dri ,ar S will be available for cleavage, thus ensuring 
produced by treatment with methylatron. but the 

- pnue nce- specific adapters, such 

sequence p when anneale d to its partner 

which has a 5 pnospnduc. , rTD 

— — r i d :r u :i r^^r:^:- 
r.:; rr^- " CDNA h a S . ; ^ - 

or T4 DNA polymerase as in standard protocols. After 
I ga ion of modified BaattI adapters and digestion o the c 
1S with ^ ! . the adapted cO.A oan be treated with K enow 
fragment and dGTP generating a defined, directionally 
oriented cDNA gene insert ready for ligation to suitably 
; pared promoter and terminator fragments. The ^ 
If the fragments is such that the 5- end of the c - 

x end of the promoter, and the 3 end or 

20 located toward the 3 ena oj-L.it- 

j.u<_ a terminator 
the cDNA is located toward the 5 end ot tn 

25 enzyme, such as Sau Jai- 

purpose, and partial drgestion followed by 
Lcrose gradients is a very standard technigue. Fragment 
pools from three different digestions that vary in the 
Lncentrat.cn of initial enzyme can be used to allow for 
30 differences in enzyme sensitivity within the 

Following size fractionation and purification, the 
fragments can be treated with BamH! methylase to protec any 
internal BamHI sites, followed by treatment 
interna results in gene fragments 

fragment and dGT^ T ^ ^ 

35 that are internally methylated fragmen „ are 

only dATP-dGTP overhangs. See f.gu-- 



WO 96/34112 



PCT/US96/06003 



fraojr ably r ePared CDNA ' Pr ° moC "- -d terminator 
fragments can be lig ated at i 6 . c over night. A ratio of ,„ 
promoter ,p, = x cDNA : 10 terminator (T , may be L the 

patron reaction. The optima! ratio may be determineY 
IO- e TOiFically by techniques known in the art Th» H 

Cloning procedure provides only one Uglon S-^T^. 
correctly oriented promoter- sense insert-termLator^e^' 3 

Ligation of prepared genomic DNA, promoter, and 
15 terminator gene fragments may be carried out at 16-c with 
varying ratios. since none of the liaation 

self-lioare , k ligation components can 

elf Irgate. the optimal ratios may be determined 
empirically. It is estimated chat ha 

products formed are directly useable. l /4 of the prLucts 
20 forced cannot enter the rounds of ligations, and T/4 L 

"c„r e iiga - d - — — £ - 

The following combinations (p.promoter. frag.B^' 

1. P-frag-T 5. P-garf-T 

2- T-frag-P 6 . T -garf- P 

3- P-frag-P 7. P-garf-P 

4- T-frag-T 6 . T-garf-T 

30 but be COmbinat ;° nS 1 ' 6 & 2.5. represent the desired constructs 

orientation for any given gene <l and 6) 

cannotT inat °: /terminat ° r 9 ™ c «— «•• «y form, but 

" 1 c TJZ ^. any — -"P -cause of 

BajnFI H P 5 ' bSCaUSe ° £ Che "luncd. uncut 

Bami end at their 3' termini. 
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Promoter/promoter constructs will clone in subsequent 
ligations only to other exposed BamHI ends, because the Bgl 
II end lacks a 5' phosphate (first round). Subsequent 
ligations to the exposed Bgl II end should be rare with 
5 incoming gene cassettes because of the lack of 5' phosphates. 
Exposed BamHI ends will only be made possible on resident 
forming chains and not on incoming new gene cassettes. Thus 
it is expected that such promoter/promoter gene cassettes 
will terminate a chain by circularizat ion with a nearby BamHI 
10 -site' "on -another chain, such circularizat ions are non- 
recoverable. If such promoter/promoter fragments become a 
significant problem to ligation efficiencies then an 
intermediate kinase treatment of the fixed growing chains 
prior to addition of new gene cassettes should allow the 
15 promoter/promoter fragments to extend the growing chains by . 
forming Bgl II/Bgl II ligation products. The kinase 
treatment will promote Bgl II/Bgl II and Bgl Il/BamHI 
ligations on the solid phase, which will circularize the 
growing chains involved. 

20 

5.5.5. SERIAL LIGATIONS OF GENE CASSETTES TO FORM 

CONCATEMERS . 

Ligation of the gene cassettes, each consisting of 
either genomic DNA or cDNA insert flanked by 
25 promoters/terminator combination will be performed in a 

method analogous to that outlined previously for prokaryotic 
DNAs . The major difference here is that this strategy used 
the endonuclease BajnHI to create exposed 3' restriction sites 
for subsequent cloning. The use of either BamHI methylase or 
30 5 -methyl -dCTP insures that BajnHI sites within the insert DNA 
will be protected. See Figure 5E. 

After 5-10 rounds of chain ligation, the growing chains 
of concatemers will be deprotected with BamKI and prepared 
for ligation to the expression vector by treatment with the 
35 Klenow fragment and dAT? and dGTP . This will render all ends 
of the growing chain incapable of ligating to each other, 



WO 96/34112 



PCT/US96/06003 



thus eliminating any circularizat ion and loss of concatemer 
chains . 



Vector DNA can be ligated to concatemer chains in a 5 - 1 
molar ratio. Other ratios may also be used. The can be done 
5 at 16*C for 8-12 hours, or at 22°C for four hours. Following 
ligation the beads can be washed and resuspended in intron 
nuclease restriction buffer. Digestion will be carried out 
as described by the manufacturer's instructions. Any intron 
nuclease may be used. The enzyme Ceul is preferred for it 
-10- produces non-palindromic 3' overhangs, which are useful in 
preventing self -ligations . See Figure 5F. 

5-5.6. CIRCULARIZATION AND TRANSFORMATION OF 

VECTOR CONTAINING Con^r^ r M iptc 
15 Concatemer-vector molecules released from the solid 

Phase can be encouraged to undergo intra -molecular ligation 
by dilution of the Ceul digestion mix 100-fold with IX ligase 
buffer. T4 ligase can be added, and the reactions may be 
carried out at 22<>C for 4-6 hours, or 16°C overnight. See 
20 Figure 5F. The resulting constructs may be concentrated by 
microfiltration or freeze -drying, and introduced into either 
S. pombe strains, or alternatively into E. coli or S 
lividans strains by standard methods. Any method may be 
used, including but not limited to elect roporat ion , and 
25 modified calcium-phosphate transformation methods. 

5 ' 5 ' 7 ' ?op P ^o^S ™ D LIGATION OF PREPARED VECTOR 
FOR EXP RESSION TN YEAST 

This section describes procedures that may be generally 
3o applaed to prepare combinatorial gene expression libraries 
using yeast as the host organism. 

For preparing a library in S. pomhe, one oossible 
vector, but certainly not the only vector, is the E . coli/S 
pombe shuttle vector pDblet (Brun et al . 1995, Gene 164-173- 

177} TK^ , 



35 



This vector has the advantage of having multiple 
cionmg sites and fl phage origins, being expressed at 
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~" ICs- invention, the multiple cloning site 

a r->f known sequence. See t-igure 

iLron nudLe en Z yme that is used Co reUase the — 
ch ain £I - the solid phase .enera.es 3. nucleotic e over h a 
o£ a defined sequence ( 3 ' GATT . . . ) - An engineered BstXI site 
having the sequence CCACCTAACTGG generates the appropriate 
■to-CTAA-i'- overhang after cleavage. 

To modify pDblet. it can be first cut with Bad l Not! 
to remove the existing BstXI site which does not have the 
correct sequence. The pDblet plasmid. once purified by spxn- 
chromatography or other means, can be mixed wrth « 
l5 presynthesized oligonucleotide which contains in addition to 
a correct sequence for the BstXI site, a new Ncol site and 
S.CI- and Not!- compatible overhangs. Bee Figure 6C. After 
ligation and transformation, mini-preps of clones are checked 
for correctness by digestion with Ncol . Correct clones will 
20 be identified by the presence of both a BstXI and Ncol site. 
Treatment of this modified pDblet. with BstXI followed by 
XhoJ sites generates a vector that contains a 5- Xhol site 
and a 3. CTAA BstXI overhang. See Figure 5E This cleaved 
vector can be treated with Klenow fragment and dCTP and 
2 S to render it incapable of ligating to itseif. Buch a vector 
may be used to accept the ooncatemer chains. 

556 PI.aMT EXPF ""™ T.TRRARIES 
This section describes procedures that may be generally 
30 applied to prepare combinatorial gene expression libraries 
using plant cells as donor and/or host organisms. 

For preparation of donor DNA from plants, the following 
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general procedure is applied- M) a nT . . 

tissue in cold ether to e Pretreatment of the plant 

etner to enhance cell disruption- (2) 
mechanical homogenization of the tissue h .1 
sand, glass bead, i • 7 finding with 

yj.cts, s Deads or aluminum oxide- (^) f-n.- ^- 
5 n,esh to r«ve cen debris . and ' £llC "tron through a 

the procedures described l n ^ of the DNA by 

™ is modified as described : • e j::/:t 3 9 T r ified 

or nopaiine synthase propter. an d „opa lin e synthas ^ ^ 

veotor as describe, in S.5.S an d 5.5 6 " ' 

A preferred plant DNA vector is R<mo 

which uses T-DNA bord,r= ,, " S variaM = 

15 vir region of c ! ^ *»«=««- of the 
t-eraLens to tran 2r Z d 

nuciear genome of p^ £ t *~ 

» <clontech P PaL AiL " Ln L us 3 !; — * 

A ca d e mic : 5 r .:;r^; e wei — - — 

are intro d uce d into protopias, ceii ^"SCS,^^" 
Pdyethyiene giycoi as described in Po e 5 ™ 
"Methods for pi 3rif m i et al - 1988 m 

d escrL ed -r:ii:: 0 r 5 e ; c : psulaced £or p — - - 
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mr rnNSTRUCTION AND SCREENING OF 

6 . EXAMPLE : COM B^NATOR I AL GENE EXPRESSION 

T.TRRARY • 

Th e following subsections describe the preparation and 

pr e-screening ^^^^^J^T 

strepMyceS lividans. E. coli and S. porcbe as host 
Streptony ^ Qf the lifarary cells 

° r9a r S t lolic activity of the donor organises indicating 
display metabolic activity ^ 
«4a£- potentially interesting donor metabolic pathways are 
functional in the host organise. In addition. » » shown 
that one library clone contains DNA encoding a marrne 
bacterial protein that shares seauenee homology to a known 
enzyme in a metabolic pathway. 

15 

6 i m&TF.P T ALS ^" METHODS 
Reagents useful in the present method are generally 
commercially available. For example: Restric tion 
Gene Clean, Genome kit (BiolOl, Vista, CA) ; Restriction 
™ enzyra ;: PCR reagents, and buyers —ga Madison WX New 

t=. mTla CA) ; TA cloning Kit 
Enqland Biolabs; Stratagene , La Jolla, caj 

nvitrogen. La Jolla, CA, Bacterial m edia (Difco. Inc.,, 
La Tip (Hawaiian Marine Imports. Inc.,.- pBSK plas ral d XL1- 
„R cells. Supercos 1 =o Sm id, Gigapac* packaging extracts 
25 1n11a PA) • Qiagen QIAprep plasmid 

(Stratagene, La Jolla, uij , v a 

purification kit .Oiagen. Inc.. Chatw orth CA ^ - 
con : ugated magnetic porous glass <MPG» beads (CPG^ Inc N. 
Jersey,,- petri dishes, 96- and 384-well plates. Omnr-Trays 
(N unO. 96- and 3«-pin replicator and £ »™. <V * P 
"> Scientific. San Diego, CA, ,- .mpicxllxn (IBI. Inc 
fluorescent orotein and GFP cDNA (Clontech, Inc.); 
fluorescent . ,, CA) . bacterial species 

oliaonucleotides (Genset, La Jolla, 

and^DNA seguences not elsewhere designated lA.encan .ype 
culture collection, Rockville, MD) * 7-thoxy-h.pt.o.cyx- 
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coumarin 



BCECF-AM (Molecular Probes, Oregon); 3-methyl 



benzoate, 3 - chlorotoluene , m-toluate, tetr 



^vcline , 



101 
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chloramphenicol, acetaminophen, arsenic anti m 

neonate. and other chemicals unless no^ed ^ . ZT ' 

Dynabeads, MPC - M (Dynal . Inc ., ^ ^ ^ 

5 6.1.1. MEDIA PBFPARATTnM 

Purified water (ddH,0) for opnpr.i 

ua - ^acrtic seawater (sea w ni ; „ , 
Scripps Znstitute of Oceanography^ To lla " 
1.0_befacfi use Synthetir ' a " d fil t«ed 

by the a dditi :r: t r;i: : ;: m ( N s ? is p — d f - ^ 

MIls (45.2mm NaF, 48 8mm Q r n ^ -> 
H BO 0.SS3« KB,. 6 . 25M KC1 , 4 . ^ ^ ™ ^"so 
16.4,* M gcl 2 , 268mM Nacl = J™ »^S0., 

yeas,":: :r :i:r r t f r ddH2 ° with i% tr — 
- — F — ~ :i:L d ?: — e-r rr* 

to 7. Um Carbon *^ with p H adjusted 



30 cultured ln e ;r 0 r:: d i i d ::i r z:r s i: d ; he or9anisms — 

extracted a nd puri£ied as Lcr!L d ^3^7?"" 

Approximately 100 „ g 9 eno M c DNA per species was 
obtained and mixed together for oa „i a , - 
by Sau 3A as described m ec L 2 """"^ 
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so as to be compatible with similarly-prepared vectors below 
(Korch 1987, Nuc Acids Res 15:3199-3220; Loftus et al . 1992 
Biotechniques 12:172-175). 0.5-3.0 /zg of the pooled 
fragments were ligated in multiple batches to 0.5-3.0 ^/g of 
5 pIJ922 and pIJ903 (Hopwood 1985, supra) vector prepared with 
BamHI or Xhol . The ligated expression constructs were 
transformed into the host organism, Streptomyces lividans , 
strain TK64 which had been made competent by removal of cell 
walls with lysozyme (Hopwood 1985, supra) . Approximately 

-1-0— IT", tTO© unique clones were generated, amplified and stored as 
mycelia in 20% glycerol and as spore suspensions in 50% 
glycerol at -70°C. 

To prepare the libraries for screening individual 
clones, the transformed TK64 host cells were spread on 150mm 

15 Petri dishes filled with Fl OA agar. After spreading, the 
plates were allowed to incubate for 21 hours at 30°C. A 
selection was performed by overlaying plates with 
thiostrepton at 5 ng/ml, 1 ml/plate. After 48-72 hours, 
colonies were picked with sterile toothpicks and transferred 

20 one per well to 96-well plates. Each well contained F10A 
media. These inoculated master plates were placed at 30°C 
for 1-4 days. The overnight master 96-well plates were used 
as source plates to replicate into one or more working 96- 
well plates or Omni-Trays. The master 96-well plates were 

25 then sealed individually and frozen at -80°C. Replication 
was done with a 96-pin replicator which was sterilized by 
flaming before each use. 

Working 96-well plates were used as source plates to 
replicate the library onto a series of differential and/or 

30 selective media and indicator plates. Selective antibiotics 
included erythromycin, novobiocin and neomycin. Differential 
media included Fl OA and R5 medium containing substrates X- 
glucopyranoside and X-gluconic acid. Indicator plates 
included library clones grown on Fl OA then overlaid with a 

35 indicator lawn of Enterococus faecalis (E. faecalis) , 

Bacillus subcilis (B. subtilis) or SOS Chromotest (with X- 
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gal) . The results are compiled and compared to the profiles 
of SCreptomyces host TK64 . 

The clones of the library are also pre-screened by 
macrodroplet encapsulation. For each pre-screen, 50,000 
5 amplified clones of the library are encapsulated by the 
method as described in Section 5.4.13. 

6.3. PRE -SCREENING OF ACTINOMYCETES/E . COLI 

COMBINATORIAL CHIMERIC PATHWAY EXPRESSION 
LIBRARY BY MACRODROPLET ENCAPSULATION 
"Genomic DNA obtained from the thirty four actinomycetes 
species (identified as species # 501-534) as described in 
Section 6.2, were used in the preparation of a combinatorial 
chimeric pathway gene expression library in a S. lividans 
host. Fractions containing fragments of Sau3A-digested 
genomic DNA of 2-7 kb were pooled. 

Aliquot s of the genomic DNA fragments are ligated to the 
different promoters separately to form gene cassettes as 
described in Section 5.5.3. The concatemers are formed by 8 
cycles of ligation and deprotection using a different pool of 
gene cassettes for each cycle, such that the resultant 
concatemers each have 8 gene cassettes comprising 8 different 
promoters attached to fragments of genomic DNA. 

Ten micrograms of the concatemers were circularized and 
ligated to 0.5 fig of SuperCos 1 vectors at the BamHl site to 
form expression constructs, which were packaged in vitro for 
infection of the E . coli host cells XL1-MR according to the 
manufacturer's directions (Stratagene) . Approximately 
1,000,000 of unique clones are obtained, amplified and pooled 
to form an amplified library. The library was stored at - 
70°C. Amplified cells are encapsulated as in Section 5.4.10, 
and pre-screened as in 5.4.14. 



35 
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PRE SCREENING OF FUNGAL/ SCHIZOSACCHAROMYCES 

ENCAPSULATION ■ 

Two combinatorial chimeric pathway expression libraries 
5 were prepared using Che blowing fungal donor organisms 
obtained fro. ATCC: Trichoma reesei , Fusarium ^ », 
Penicillin roguefortii, Khizopus oligosporus, Neurospora 
crassa. Phycomyces blankesleeanus. Aspergillus fumigatus, 
Aspergillus flavus, Ettericella heterothallica. Chaetonuum 
10 ..g t , f «o. Penicillium bccatum. Penicillin chrysogenum 

Each species was cultured separately in 500 ml potato 
dextrose agar ( PDA ; Difco, or «lt extract agar (MEA; Dl CO, 
at medio* rpm for 48-72 hours. Spore inoculations of 1x10 - 
XX10« spores per ml were placed into 500 ml of potato extract 
15 or malt extract broths in 1 liter culture flasks and grown at 
22 C 225 rpm, 48-72 hours. 

cultures were harvested by filtration through Miracloth 
(Calbiochem, under vacuum. The collected mycelial masses were 
washed with 2 litres of dd Hl O. and air-dried for 10 minutes 
20 before freeze drying. Fungal genomic DNA and mRNA were 
extracted and purified from the mycelia as described in 
Sections 5.3.1 and 5.3.2. A portion of the harvested mycelra 
were freeze-dried and stored at -70°C. 

Fungal genomic DNA fragments were prepared as described 
25 in Sections 5.4.2. Fungal mRMA was converted into c«* 

according to standard methods. (Sambrook et a . 1989^ Watson 
CJ t Jackson JF (1985) DNA cloning: A practical approach 
88 IRb Press) . Weight equivalents of DNA fragments from 
each species were pooled to yield a genomic DNA poo! and 

30 CDNA Each'of these pools containing approximately 5-10 fig of 
DNA is used independently to assemble a combinatorial 
chimeric pathway expression library. The following S 
pombe-compatible promoters and terminators were genera ed 

35 described in Section 5.5.2: CMV immediate/early SV, 
ear l y rsv, HSV thymidine kinase. CaMV, nmtl, adhi ano 
promoters. The promoter and terminator fragments are 



WO 96/34112 



PCT/US96/06003 



:? 4 the E T and genomic dna p °° is •■ -«"~» - 

° S 5 ' 5 ' 4 ' Each ^ c«..tte averaging 5 Rb in length 
xyyb, Gene, Vol. 164 nn i -7^ mi 

5 5 7 The "3-177) as described in Section 

' H T eXPreSS1OT —tracts were transformed into s 
P«*e cells via lithium acetate method of Giet* and Woody ' (PD 
Gretz woody, Molecular genetics of yeast: A practical 

ao-approach. chapter 3. pp 121 . 134) . Upon selecti J 

presence of the ura* ,a rte , 110 , 0 00 S . pombe clones are 

ampl.fred Irbrary ready for pre-screening. The following 

15 T K rS r S a " Per£ °™ d = ««y- substrate test, anti 
15 mrcrobral activity, antibiotic resistance. 

6 ' 5 ' f??-SCREENING OF MARINE GRAM (-)/£ C OI r 

LIBRARY nv p,» TC REpT.TPaxVXS' ' /E ' 

the R Marine baCCSria ° btained f »» — ter collected near 

Rented marine ^^It."^^-"'- 
Z ° £ the ^ librarieS " redundancy, and 

25 ^Ld de ;L r ;::i e ; he array ° f — - - — - - 

The following assays were per£ormed on the 

specxes of marine gram-negacive/ £ . coli library, with the 
indicated results: 
Assay ^ 

rhT _ Positive specie . nut of 37 SJ ^cies 

30 Chromazurol S (CAS) 

Streptococcus pyogenes ^7 
Enterococcus faecal is 0 
Proteus mirablls 3 
Sarci «a aurantiaca 1 
Staphylococcus aureus 10 
Starch digestion 6 

17 

Of these assays, the following were selected to 
P er £ormed on the ^-pi 1 c ^ v, 

.elxs of the combinatorial gene expression 
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library in B. coll : CAS, S. aureus.- S. aurantiaca.- starch 

""""^fly each of the ,0 parental species was inoculated 
into 5 -1 of B3 mediu. and cultured overnight at 30*C, 300 
S rpm in Falcon 2059 tubes. the overnight cultures were 
pelleted and the total genomic DNA extracted by standard 
procedures. Genomic DKA was guantified by visualization on 
an agarose gel and 5 „g DNA fro™ each of the 40 spec.es was 
contributed to a pool totaling 200 „g. The combinatorial 
*0-n«tu~l. pathway expression libraries were assembled ^ 
coli as described in Section 5.1.4. This DNA was partially 
digested, ligated to SuperCosl and packaged in X phage for 
introduction into E. coli according to the SuperCosl 
manufacturer's directions .Stratagene, . This resulted in 5 
1S !0< unique clones, which was amplified to 7 x 10 /ml cfu by 
standard protocols. The amplified stock was stored in 15, 
glycerol at -70°C for subsequent use. 

To prepare the libraries for screening individual 
clones, the amplified library cells were spread on 150mm 
2 0 Petri dishes with 50ml LB. lOOmg/ml ampicillin and 50aK,/ml 
kanamyctn. The plates were previously dried for 24 hours at 
anient temperature in the dark. The 7 X iC/ml cfu 
diluted in LB to 500 cfu/ml. One ml was spread on each 15 mm 
plate. After spreading, the plates were allowed to incubate 
25 overnight at 37=C. Resuiting colonies were picked with _ 
sterile toothpicks and transferred one per well to 3.4—11 
plates. 6400 colonies were picked and archived. Each well 
contained 75 „1 LB, 50 «,/.! ampicillin, 7% glycerol. The 
outer rows <80 wells total, were not inoculated but were 
30 similarly filled with medium to provide an evaporation 

barrier during subsequent incubation and freezing. These 
inoculated master plates were placed at 37°C for 16 hours 
without shaking. The overnight master 364 -we! 1 plates are 
used as a source plate to replicate into one or more working 
35 multi-wen plates or Omnr-Trays. The master 384-well plat.- 

, r -- -eo°c. 



were then sealed individually and frozen at ^ 

Replication was done wrth a multi-pin replicator. Before ana 
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after each use. the 384-pin replicator w as dipped 
sequentially into bleach for 20 seconds. water for 30 
seconds then ethanol for S seconds before flaming 

Workrng multi-well plates or Omni-Trays were u,eH 
S source plates to replicate the DNA libraries Into a s 

differential and/or selective media ^ 7^17^ " 
detects media ,CAS, or antimrcrobial lawns). The result 
were compiled and cohered to the profiles of the wild ^ 
marrne bacteria used to construct the DNA librar^ ^ 

' Cl0n " were i^leted that were positive for starch 

digestion ability. These clones were tested for th t n 

^"-^ ™ - S. aureus or .. ^ „ ^ 
clone was found to inhibit the growth of <r 

clone was sub.ected to further analysis ^ 
IS sequence analysis, and was found to contain DNA sequences 
encoding proteins homologous to those in a polyketide 
synthesis pathway. Flgure 10 shows th- ^"J^f 
predated a mi no acid sequence of a DNA sequence derived f 
c one cxc-A^.0 with the actinorhodin dehydrase 1 
20 Streptonyces coelicolor. 

The active component from this clone is further analvzed 

The DNA sequence contained in this clone was f urthp 

seance o h ^ ^ " 

sequences were used n th 9 I ITZ^T ™ ^ 

„ n . , cne PCR as Positive control The 

reactions were performed by standard method using a set of 

3s see - gur eT 1 \ 1 - T h~ie? P :: p L i ::: i or — — 

• , ^ r-^rc reactions were rpnpsfoH 

w«h genomic DNA of individual parental species. rig ! 12 
shows that genomic DNA derived from species * 6 from Pool " . 
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species #16 from Pool 2 and species #31 fron, Pool 3 were 
positive in the PCR reaction. This suggested that the 
^ litpiv derived from any of these 

identified DNA sequence was likely derivea 

3 species of marine bacteria. 

Thus, the results show that the combinatorial gene 
' expression library contains clones carrying genetic material 
derived from marine bacteria that encodes metabolic pathway 
of interest. Furthermore, it is shown that such clones in 
the library can be identified, and isolated by pre- screening . 

6 6 PRE-SCREENING OF MARINE GRAM (-)/ E. COL I 

6 - LIBRARY BY M ACRODRQPLET ENCAPSULATION 

30 000 clones were encapsulated by taking sodium 
alginate (Protanol LF 20/60, Pronova Biopolymer, Drammer, 
15 Norway) and dissolving it in 100 mL of sterile water at a 
concentration of 1% using an overhead mixer at 2000 rpm. 
One ml of library suspension containing 30,000 cells was 
added so as to embed 1-5 clones per droplet. The mixture was 
allowed to sit for 30 minutes to degas. The mixture was then 
20 extruded through a 25 gauge needle. These fluids were 

dropped into an O.SL gently stirring beaker of 13SmM calcium 
chloride. Droplets were allowed to harden for 10 minutes and 
then were transferred to a sterile flask and the calcium 
chloride removed and replaced with LB /Amp media and a 
" substrate, X-glucosammide , at 80 M g/«1 . Other substrates 
were X-acetate, X-glucopyranoside , X-gal and specific custom 
substrates relevant to polyketide pathways. Flasks 
containing the droplets were then shaken at 30°C overnight 
and examined the following morning for positive clones 
30 indicated by the presence of blue colonies. Clones are also 
co-encapsulated with indicator cells as described in 5.4.14. 
indicator cells include S. aureus, S. aurantiaca. 

Droplets were placed in a single layer in a large clear 
tray and scanned by eye. One X-glucosaminide positive was 
35 recovered, resuspended in 15% glycerol and scored at -70 C 
Other positive colonies are removed and placed in 96-well 
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.aster plates containing LB /Amp and so m« sodium citrate p„ 
v., to dissolve the matrix, and allowed to grow at 37-c 
overn lgh t. Thes e overnight master 96-well plates are „ 
a source plate to replicate into one or more" ^King " " 
5 multi-well p la tes or Omni-Trays. The master 96 - wel T plates 
are then sealed individually and frozen at -80»C. L " 
clones are either sent for specific testing of the pro d " ct 
or sent through another round of pre-screening or screening 

Having thus disclosed exemplary embodiments of the 
present invention, it should be noted by those skilled- , 



25 



30 



35 
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WHAT T g CLAIMED IS: 

!. A combinatorial gene expression library, 

comprising a pool of expression constructs, each expression 
5 construct containing a cDNA or genomic DNA fragment derived 
from a plurality of species of donor organisms, in which the 
cDNA or genomic DNA fragment is operably-associated with one 
at)~or- more- regulatory regions that drives expression of genes 

encoded by the cDNA or genomic DNA fragment in an appropriate 
host organism. 

15 2 . A combinatorial chimeric pathway gene expression 

library, composing a pool of expression constructs, each 
expression construct containing randomly concatenated cDNA or 
genomic DNA fragments derived from one or more species of 
donor organisms, in which the concatenated cDNA or genomic 
DNA fragments are operably-associated with one or more 
regulatory regions that drive expression of genes encoded by 
25 the concatenated cDNA or genomic DNA fragments in an 
appropriate host organism. 



20 



3 . 



A biased combinatorial gene expression library, 
30 comprisxng a pool of expressxon constructs, each expression 
construct containxng cDNA or genomxc DNA fragments some of 
whxch are preselected from a pluralxty of species of donor 
organxsms for a specifxc property, xn whxch the cDNA or 
35 genomxc DNA fragments are operably-associated wxth one or 
more regulatory regions that drive expression of genes 
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encoded by the cDNA or genomic DNA fragments in an 
appropriate host organism. 



5 4. 



The gene expression library of Claim 1, 2 or 3 
in which the expression construct comprises a plasmid vector, 
a Phage, a viral vector, a cosmid vector, or an artificial 
chromosome . 
10- ■ . 



5 . 



The gene expression library of Claim 4 in which 
the vector is a shuttle vector capable of replicating in 
is different host cell species or strains. 

6- The gene expression library of Claim 1, 2 or 3 

in which the cDNA or genomic DNA fragments are derived from 
20 bacteria, fungi, algae, lichens, plants, protozoans, 

metazoans, coelenterates , insects, .ollusca, sponges, worms, 
amphibians, reptiles, tunicates, birds or mammals. 

2& ? - The 9ene e *Pression library of Claim 1, 2 or 3 

in which the donor organisms comprise a mixture of 
terrestrial microorganisms or marine microorganisms, or a 
mixture of terrestrial microorganisms and marine 

3 0 

microorganisms . 



8- The gene expression library of Claim 1 , 2 or 3 

^ xn which the cDNA or genomic DNA fragments are derived from 



an environmental sample. 
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The gene egression library c£ Cl.i» 1 in which 
the cDNA or 9 e„o.ic D NA fragments cc.pr.se one or more 
operons, or portions thereof. o£ the donor microorganism. 



5 

10 
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The gene expression library of Claim 9 in whioh 
th e operon or portions thereof encodes a cohere or partiai 
metabolic pathway. 

The gene expression library of Claim 1, 2 or 3 
in which each expression construct is contaxned in a host 



cell- 

15 



The gene expression library of Claim 11 in which 
Che host cells have been modified by the introduction, 
induction or overproduction of active efflux systems. 

20 

13 . The gene expression library of Claim 11 in which 

the host cells have been modified by the introduction. 
25 induction or overproduction of a Known metabolic pathway of 
interest or portion thereof. 

The gene expression Irbrary of darm n rn which 
the host cell is a bacterrum, fungus, plant cell, msect 
cell, or animal cell- 

The gene express.on Ixbrary of Clarm 14 in which 
35 t he host cell i. Escherrchra col,, Bacrllus subcilis. 

srreptomyces i^dans. Streptomyces coeUcolor Saccharomyces 
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~ e ' &hi ~- p— • frU91perda 

Asper g il lus nidulans . Arabidopsis thaiiana _ NicQtiana 

t3baCUm - " 11S ' >" VE*, cens, NI „ /3T3 cells or 

5 CHO cells. 



16. 



The ge „ e expression libr.ry of ci aim 11 in whlch 
C " 1S fU " her COntai " * — ~*-» —red to 

tT clones in the library that - — * 

metabolic pathways or compounds . 

15 th " ' ^ 9ene «*«"i«> library of claim ls in uhich 

the reporter re g imen comprises DNA encod ing a reporter g ene 
operahry-assooiatea with a rogatory re gi on that is 
inducible or adulated b y the desiraWe 

20 compounds expressed by the host cell. 

th ^ "brary of claim n in which 

he host cells are in a matrix contai„in g a reporter re g rmen 
tailored to identify clones in the library that are 
expressing, desirable metabolic pathways or compounds. 

A method for makinq a combinatorial gene 
expression library, comprise l iga t ing a DNA vector to cDNA 
or g enomic DNA fra g ments obtained from a plurality of species 
of donor organisms to generate a libr a ry o, expression 
35 constructs in which 9 enes contained in the cDNA or genomic 
DNA fra g ments are operably- associated with their native or 
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exogenous regulatory regions which drive expression of the 
genes in an appropriate host cell. 

5 20. A method for making a chimeric pathway gene 

expression library, comprising randomly concatenating cDNA or 
genomic DNA fragments obtained from one or more species of 
donor organisms, and ligating the concatenated DNA fragments 
to" a DNA vector to generate a library of expression 
constructs in which genes contained in the cDNA or genomic 
DNA fragments are operably-associated with their native or 
exogenous regulatory regions which drive expression of the 
genes in an appropriate host cell. 

21. A method for making a biased combinatorial gene 

expression library, comprising ligating a DNA vector to cDNA 
or genomic DNA fragments obtained from one or more species of 
donor organisms, some of which are selected for a specific 
property, to generate a library of expression constructs in 
25 which genes contained in the cDNA or genomic DNA fragments 
are operably-associated with their native or exogenous 
regulatory regions which drive expression of the genes in an 
appropriate host cell . 

30 

22. The method of Claim 19, 20 or 21 in which the 

DNA vector is a plasmid vector, a phage, a viral vector, a 
cosmid vector, or an artificial chromosome. 
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23. The method of Claim 22 in which the vector is a 
shuttle vector capable of replicating in different host cell 
species or strains. 

5 

24. The method of Claim 19, 20 or 21 in which the 
cDNA or genomic DNA fragments are derived from bacteria, 
fungi, algae, lichens, plants, protozoans, metazoans, 

10 -coeiefTterates, insects, mollusca, sponges, worms, amphibians, 
reptiles, tunicates, birds or mammals. 

15 25 ' The method of Claim 19 • 20 or 21 in which the 

donor organisms comprise a mixture of terrestrial 
microorganisms or marine microorganisms, or a mixture of 
terrestrial microorganisms and marine microorganisms. 

20 

26. The method of Claim 19, 20 or 21 in which the 
cDNA or genomic DNA fragments are derived from an 
environmental sample. 

25 

27. The method of Claim 25 in which the cDNA or 
genomic DNA fragments comprise at least an operon, or 
portions thereof, of the donor microorganisms. 

30 

28. The method of Claim 27 in which the operon 
encodes a complete or partial metabolic pathway. 
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29 . The method of Claim 19, 20 or 21 further 

«-v^ lihrarv of expression constructs 
comprising introducing the library or P 

into a host cell. 

30 . The method of Claim 29 in which the host cell is 

- incprt cell, or animal cell, 

a bacterium, fungus, plant cell, insect 

■10- — 31 -. . The method of Claim 30 in which the host cell is 
Escherichia coli, Bacillus subtilis, Streptomyces lividans, 
Streptomyces coelicolor, Saccharoses cerevisiae, 
Schizosaccharomyces pombe , Spodoptera frugiperda, Aspergillus 
nidulans, Arabidopsis thaliana, Nicotiana tabacum, COS cells, 
293 cells, VERO cells, NIH/3T3 cells or CHO cells. 



15 



32 . 



The method of Claim 29 in which the host cells 
further contain a reporter regimen tailored to identify 
clones in the library that are expressing desirable metabolic 
pathways or compounds. 



25 

33 . 



30 



The method of Claim 3 2 in which the reporter 
regimen comprises DNA encoding a reporter gene operably- 
associated with a regulatory region that is inducible or 
adulated by the desirable metabolic pathways or compounds 
expressed by the host cell. 



The method of Claim 29 in which the host cells 
in which the host cells have been modified by the 



34 . 

35 
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introduction, induction or overproduction of active efflux 
systems . 



= 35- The method of clai m 29 in „ hich the host ^ 

have been modified by the introduction, induction or 
overproduction of a Known metabolic pathway of interest or 
portion thereof. 
10- . 

3«- A method for identifying a compound of interest 

in a gene expression library, comprising: 
15 (a) culturing the gene expression library of claim 

11; and 

<*) screening the gene expression library for a 

clone which produces the compound. 

20 

37- A method for screening a gene expression library 

for a compound of interest, comprising: 

(a) culturing the gene expression library of claim 



25 



16; and 



U» detecting a signal generated by the reporter 

regimen ; 

^ thereby identifying a clone which produces the compound. 

38- A method for screening a gene expression library 

for a compound of interest, comprising: 
35 (3) culturing the gene expression library of cla im 



18; and 
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(b) detecting a signal generated by the reporter 



regimen ; 



thereby identifying a clone which produces the compound. 



5 



39. 



A method for producing a compound of interest, 



comprising : 

{a , culturing the clone identified in claim 36; and 

-10- recovering the compound from the culture of the 
identified clone. 



40. A method for producing a compound of interest, 

15 

comprising : 

(a) culturing the clone identified in claxm 37; and 

ering the compound from the culture of ' the 



20 



(b) recovering 

identified clone. 



41 . 

comprising : 
25 (a) 
(b) 



A method for producing a compound of interest, 

culturing the clone identified in claim 38; and 
recovering the compound from the culture of the 
identified clone. 



35 
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FIG.6B 
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Soc! Ncol BslXl Not! END 
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